Harness the Power of OpenAI’s Whisper Model for ASR with Voiceflow: A Local, Dockerized Solution

Voice assistants are becoming increasingly popular as they provide an efficient and intuitive way for users to interact with various…

Harness the Power of OpenAI’s Whisper Model for ASR with Voiceflow: A Local, Dockerized Solution

Voice assistants are becoming increasingly popular as they provide an efficient and intuitive way for users to interact with various applications. With the advent of large language models (LLMs) like OpenAI’s GPT series, voice assistants have become more capable of understanding and generating responses for longer and more complex user inputs.

In this article, we will go over a quick project, Voiceflow ASR Demo, which harnesses the power of OpenAI’s Whisper model for Automatic Speech Recognition (ASR) without the need for an external API. By using a Docker container, you can run the ASR service locally or on your server, providing a more versatile and customizable solution.

What’s the idea?

As users interact with LLM-powered voice assistants, they tend to provide longer and more complex utterances. This is beneficial, as it gives the assistant more context to generate better answers. The idea then is to use the Whisper model for ASR without relying on an external API, offering you more control and customization options while keeping your data in-house.

What is the Voiceflow ASR Demo?

The Voiceflow ASR Demo is a test page that demonstrates ASR capabilities using OpenAI’s Whisper model. The project consists of a simple webpage that captures audio from the user’s microphone, sends it to your custom endpoint, and displays the transcribed text and the time it took to render the transcription.

Key Features

  • Start and stop recording with a button
  • Auto-end recording after a specified duration of silence
  • Utilizes a Docker container to run the ASR webservice locally
  • Uses a proxy to avoid CORS issues

Setting Up the Voiceflow ASR Demo

To get started, you will need to have Node.js and Docker installed on your machine. Follow these steps to set up the demo:

GitHub - voiceflow-gallagan/whisper-asr-demo
This project is a test page to demonstrate Automatic Speech Recognition (ASR) using OpenAI's Whisper model running…
  1. Clone the repository: git clone https://github.com/voiceflow-gallagan/whisper-asr-demo.git
  2. Change to the project directory: cd whisper-asr-demo
  3. Install the required dependencies: npm install
  4. Pull and run the Docker container for the ASR webservice: docker run -d -p 9000:9000 -e ASR_MODEL=base.en onerahmet/openai-whisper-asr-webservice:latest
  5. Start the proxy server: npm start

Now, the proxy server should be running at http://localhost:3000. Open the index.html file in your browser to test the ASR demo.

Using the Voiceflow ASR Demo

  1. Click the “Start Recording” button to start capturing audio from your microphone.
  2. Speak into your microphone.
  3. The recording will stop automatically after a specified duration of silence (2 seconds by default) or can be manually stopped by clicking the “Stop Recording” button.
  4. The transcribed text and the time it took to render the transcription will be displayed on the page.

Conclusion

This Voiceflow ASR Demo should be a good start for you to provides an efficient and customizable way to leverage OpenAI’s Whisper model for ASR in your Voiceflow Voice Assistants. By using a local or server-hosted Docker container, you can avoid relying on external APIs and maintain greater control over your data.

Thanks to Ahmet Oner for sharing the whisper-asr-webservice we are using in this demo, do not hesitate to check it to find more information and details to use a different model.

GitHub — ahmetoner/whisper-asr-webservice: OpenAI Whisper ASR Webservice API
Whisper is a general-purpose speech recognition model. It is trained on a large dataset of diverse audio and is also a…