Press a hotkey, speak, text appears at your cursor. Voice-to-text for Linux/X11 using OpenAI's Realtime Transcription API.
- Toggle mode: first press starts recording, second press stops
- Types directly into whichever window was focused when you started
- Auto-detects language — no configuration needed
- Streams in real-time — text appears as you speak
- Babashka (bb)
- PulseAudio (
parecordfrompulseaudio-utils) xdotoolxclipnotify-send(fromlibnotify-binorlibnotify)- An OpenAI API key with access to the Realtime API
On Debian/Ubuntu:
sudo apt install pulseaudio-utils xdotool xclip libnotify-bin- Create the config directory and file:
mkdir -p ~/.config/ok-voice
cp resources/config.example.yaml ~/.config/ok-voice/config.yaml- Add your OpenAI API key:
$EDITOR ~/.config/ok-voice/config.yamlSet api-key to your key:
api-key: "sk-..."Alternatively, set the OPENAI_API_KEY environment variable.
- Bind a hotkey with xbindkeys:
Install xbindkeys if you don't have it:
sudo apt install xbindkeysAdd to ~/.xbindkeysrc:
"bb --config /path/to/ok-voice/bb.edn ok-voice"
Mod4 + v
Replace /path/to/ok-voice with the actual path to this repository. Mod4 + v is Super+V — change it to whatever you prefer.
Then reload xbindkeys:
xbindkeys --poll-rcNow press your hotkey to start dictating and press it again to stop. The transcribed text will be typed into the window that was focused when you started.
You can also run it directly from the project directory:
bb ok-voiceRun it once to start recording, run it again to stop. The second invocation signals the running instance to shut down gracefully.
- On first invocation, ok-voice captures audio from your default PulseAudio input device
- Audio is streamed over a WebSocket to OpenAI's Realtime Transcription API
- As transcription results arrive, text is typed into the original window via
xdotool - On second invocation (or SIGTERM), recording stops and the process exits
This project is 100% vibe coded with GSD. The full planning artifacts (research, phase plans, execution summaries) are in the .planning/ folder.
AGPL-3.0 — see LICENSE for details.