Skip to content

200ok-ch/ok-voice

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ok-voice

Press a hotkey, speak, text appears at your cursor. Voice-to-text for Linux/X11 using OpenAI's Realtime Transcription API.

  • Toggle mode: first press starts recording, second press stops
  • Types directly into whichever window was focused when you started
  • Auto-detects language — no configuration needed
  • Streams in real-time — text appears as you speak

Requirements

  • Babashka (bb)
  • PulseAudio (parecord from pulseaudio-utils)
  • xdotool
  • xclip
  • notify-send (from libnotify-bin or libnotify)
  • An OpenAI API key with access to the Realtime API

On Debian/Ubuntu:

sudo apt install pulseaudio-utils xdotool xclip libnotify-bin

Setup

  1. Create the config directory and file:
mkdir -p ~/.config/ok-voice
cp resources/config.example.yaml ~/.config/ok-voice/config.yaml
  1. Add your OpenAI API key:
$EDITOR ~/.config/ok-voice/config.yaml

Set api-key to your key:

api-key: "sk-..."

Alternatively, set the OPENAI_API_KEY environment variable.

  1. Bind a hotkey with xbindkeys:

Install xbindkeys if you don't have it:

sudo apt install xbindkeys

Add to ~/.xbindkeysrc:

"bb --config /path/to/ok-voice/bb.edn ok-voice"
    Mod4 + v

Replace /path/to/ok-voice with the actual path to this repository. Mod4 + v is Super+V — change it to whatever you prefer.

Then reload xbindkeys:

xbindkeys --poll-rc

Now press your hotkey to start dictating and press it again to stop. The transcribed text will be typed into the window that was focused when you started.

Usage

You can also run it directly from the project directory:

bb ok-voice

Run it once to start recording, run it again to stop. The second invocation signals the running instance to shut down gracefully.

How it works

  1. On first invocation, ok-voice captures audio from your default PulseAudio input device
  2. Audio is streamed over a WebSocket to OpenAI's Realtime Transcription API
  3. As transcription results arrive, text is typed into the original window via xdotool
  4. On second invocation (or SIGTERM), recording stops and the process exits

Vibe coded

This project is 100% vibe coded with GSD. The full planning artifacts (research, phase plans, execution summaries) are in the .planning/ folder.

License

AGPL-3.0 — see LICENSE for details.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors