Unlocking Free Speech-to-Text Capabilities with OpenAI Whisper
Written on
Chapter 1: Introduction to OpenAI Whisper
OpenAI has recently introduced Whisper, a sophisticated tool designed to convert spoken language into written text, outperforming many human transcribers. If you are not familiar with OpenAI, they are the creators of the famous ChatGPT, which facilitates conversations with AI. They also developed DALLĀ·E 2, allowing users to generate images from textual descriptions. Whisper stands out as their latest innovation, enabling transcription from audio files in English and 96 other languages. Notably, it functions well even amidst background noise and thick accents, and best of all, it is entirely free and open-source.
Instead of installing Whisper on your local machine, which could consume significant storage, we will utilize Google Colaboratory. This online platform allows you to execute code directly in your web browser, making it accessible regardless of your computer's specifications.
Section 1.1: Setting Up Google Colaboratory
To begin, navigate to Google Drive. A Google account is necessary, but setting one up is free and straightforward. In Google Drive, locate the "New" button in the top left corner. Click on it, then select "More" at the bottom and choose "Connect More Apps." In the search bar that appears, type "Google Colaboratory" and hit search. When you see the result, click on it and proceed to install.
Upon successful installation, click "Continue" and confirm that Google Colaboratory is now connected to your Google Drive. You can now close this window. Return to the "New" button, navigate to "More," and you should see "Google Colaboratory" as an option.
Section 1.3: Preparing for Audio Transcription
To transcribe audio, click on the folder icon located on the left side of the notebook interface.
You will need an audio file for transcription. For convenience, download an MP3 file from YouTube and upload it to the designated folder.
Once your audio file is uploaded, you can now extract text from it. Enter the following command in your notebook:
!whisper "Sample.mp4" --model medium.en
In this command, we invoke the Whisper AI API to process the audio file you want to transcribe. You can select from various models, balancing speed and accuracy based on your needs. After entering the command, execute it. After a brief wait, you should see the transcription results.
Chapter 2: Reviewing Your Transcription
You can expect high accuracy in the transcriptions, and various formats (including SRT and VTT) will be available for download.
The first video, titled "Best FREE Speech to Text AI - Whisper AI," provides an overview of Whisper's capabilities and how to utilize them effectively.
The second video, "Convert Speech To Text | OpenAI Whisper Explained in 8 Minutes," offers a concise explanation of how to use Whisper for audio transcription.
Section 2.1: Final Thoughts
Remember that once you exit Google Colaboratory, your session will end, and all files will be automatically deleted. Therefore, ensure you download your transcriptions before leaving. The accuracy of this technology is impressive; it not only captures words but also applies punctuation and capitalization. You might only need to make minor adjustments.
I would love to hear your thoughts on how you intend to utilize OpenAI Whisper. If you found this guide helpful, I will be sharing more AI applications in the near future, so feel free to follow for updates. Stay tuned for more insights at PlainEnglish.io, and don't forget to subscribe to our free weekly newsletter. Connect with us on Twitter, LinkedIn, YouTube, and Discord. If you're looking to boost awareness for your tech startup, consider partnering with Circuit.