Building a CLI Wrapper for Whisper: From Transcription to Distribution

Have you ever tried a command line tool that is incredibly useful, but unwieldy? That was OpenAI's Whisper for me.

So I:

  • swapped the model for a quantized version,
  • wrote a Go program to chain an ffmpeg command before the model, and
  • distributed the Go program, called ‘better-whisper’, with my own Homebrew tap.

If you want to try better-whisper just:

brew tap akash-joshi/homebrew-akash-joshi
brew install better-whisper
better-whisper [whisper-cpp arguments] <input-file>

I'm Akash Joshi, a senior engineer who has been playing around with LLMs and Go over the past few months. This article is based on a conversation between Maurice Banerjee Palmer and me, transcribed by…better-whisper.

Why? What problem does Whisper solve?

I had a collection of YouTube videos and consulting call recordings that needed subtitles. I couldn't find any free tools to transcribe them, so I turned to Whisper, OpenAI's speech recognition model.

You can run Whisper by installing its Python package and using the CLI to transcribe local files.

But OpenAI's official package is extremely slow. Transcribing a 30-minute meeting could take me over an hour. Your results will vary depending on your CPU/GPU. But it's not fast, either way.

What are the alternatives to OpenAI's CLI?

Georgi Gerganov has ported OpenAI's models to whisper.cpp. It offers a range of benefits but the main one is that it runs dramatically faster than the originals without a noticeable loss in accuracy.

Whisper models range from tiny to large. The latest at the time of writing is large-v3-turbo. The tiny model is suitable for most tasks. In my tests, transcribing a 30-minute meeting using whisper.cpp with the small English model on a MacBook Air took only about 100-120 seconds.

I found the fastest route to get started is:

  • download ggml-tiny.en-q8_0.bin from HuggingFace
  • put it in ~/models/ggml-tiny.en-q8_0.bin
  • run brew install whisper-cpp to install whisper.cpp as a CLI locally
  • run whisper-cpp -m ~/models/ggml-tiny.en-q8_0.bin -osrt <input_file_path> to transcribe your file and save it locally as an srt subtitle file.
  • Note: You might have to run ffmpeg -i <input_file_path> -ar 16000 -ac 1 -c:a pcm_s16le output.wav before the previous command if your media file isn't a 16-bit wav file.

What problems did you encounter with whisper.cpp?

We've sped Whisper up by using whisper.cpp. But whisper.cpp needs some preprocessing.

From the README:

Note that the main example currently runs only with 16-bit WAV files, so make sure to convert your input before running the tool. For example, you can use ffmpeg like this:

Remembering to do this every time gets a bit unwieldy. So let's wrap them together in a Go CLI.

How do you write a Go CLI? Why?

I chose Go for its excellent developer experience and ability to produce cross-platform binaries.

A ‘Go CLI’ is the combination of a Go program and reading command line arguments.

To initialize your Go project, run go mod init.

Then your Go program is just a main.go with:

// all go programs start with package main
package main

import "fmt"

// main is the entry point for the program
func main() {
	fmt.Println("Hello, World!")
}

Once you have your go program you need it to read command line arguments via os.Args. To continue reading this, continue following the article in the original publication here -

Building a CLI Wrapper for Whisper: From Transcription to Distribution
How to build a Go CLI wrapper for OpenAI’s Whisper and distribute it via Homebrew. A guide to faster transcription and CLI development.