Using OpenAI Whisper to transcribe speech to text
At some point between 1.34.2 and 1.35, the rclaudio.py handler gained the capability to use the OpenAI Whisper program for transcribing speech to text. The text is then extracted in the main text and indexed.
The feature reuses the Recoll now misnamed OCR cache, so that the transcription is only run once, even when resetting the index or moving the files around.
To enable the feature:
(example on Ubuntu, you will have to adapt a little for other systems)
sudo apt install ffmpeg
pip3 install torch
Install OpenAI Whisper:
pip3 install git+https://github.com/openai/whisper.git
Add the following to recoll.conf:
speechtotext = whisper sttmodel = small #sttdevice =
Set a value for sttdevice if you have an appropriate graphic card, else Whisper will run on the CPU.
Maybe check that things work by using whisper on the command line:
whisper --language=en --model=small /some/audio/file.mp3
You can then index away. Maybe try on a small subset first…