ok-voice

Press a hotkey, speak, text appears at your cursor. Voice-to-text for Linux/X11 using OpenAI's Realtime Transcription API.

Toggle mode: first press starts recording, second press stops
Types directly into whichever window was focused when you started
Auto-detects language — no configuration needed
Streams in real-time — text appears as you speak

Requirements

Babashka (bb)
PulseAudio (parecord from pulseaudio-utils)
xdotool
xclip
notify-send (from libnotify-bin or libnotify)
An OpenAI API key with access to the Realtime API

On Debian/Ubuntu:

sudo apt install pulseaudio-utils xdotool xclip libnotify-bin

Setup

Create the config directory and file:

mkdir -p ~/.config/ok-voice
cp resources/config.example.yaml ~/.config/ok-voice/config.yaml

Add your OpenAI API key:

$EDITOR ~/.config/ok-voice/config.yaml

Set api-key to your key:

api-key: "sk-..."

Alternatively, set the OPENAI_API_KEY environment variable.

Bind a hotkey with xbindkeys:

Install xbindkeys if you don't have it:

sudo apt install xbindkeys

Add to ~/.xbindkeysrc:

"bb --config /path/to/ok-voice/bb.edn ok-voice"
    Mod4 + v

Replace /path/to/ok-voice with the actual path to this repository. Mod4 + v is Super+V — change it to whatever you prefer.

Then reload xbindkeys:

xbindkeys --poll-rc

Now press your hotkey to start dictating and press it again to stop. The transcribed text will be typed into the window that was focused when you started.

Usage

You can also run it directly from the project directory:

bb ok-voice

Run it once to start recording, run it again to stop. The second invocation signals the running instance to shut down gracefully.

How it works

On first invocation, ok-voice captures audio from your default PulseAudio input device
Audio is streamed over a WebSocket to OpenAI's Realtime Transcription API
As transcription results arrive, text is typed into the original window via xdotool
On second invocation (or SIGTERM), recording stops and the process exits

Vibe coded

This project is 100% vibe coded with GSD. The full planning artifacts (research, phase plans, execution summaries) are in the .planning/ folder.

License

AGPL-3.0 — see LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
.planning		.planning
resources		resources
src/ok_voice		src/ok_voice
AGENTS.md		AGENTS.md
LICENSE		LICENSE
README.md		README.md
bb.edn		bb.edn

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ok-voice

Requirements

Setup

Usage

How it works

Vibe coded

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ok-voice

Requirements

Setup

Usage

How it works

Vibe coded

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages