No description
- TypeScript 35.5%
- Python 33.2%
- Shell 31.3%
| .gitignore | ||
| bot.ts | ||
| config.ts | ||
| README.md | ||
| speechwatch.py | ||
| speechwatch.sh | ||
| types.ts | ||
SpeechWatch
Ambient audio monitoring → Parakeet STT → Discord transcription.
For Thomas's actual Windows PC. No cloud. No API. Just listen and transcribe.
Requirements
- Windows with ffmpeg in PATH
- Python 3.8+ with:
torch,torchaudio,onnxruntime,numpy - NVIDIA Parakeet-TDT 0.6B v2 — download with
huggingface-cli download nvidia/parakeet-tdt-0.6b-v2 - RTX 5060 Ti (or any CUDA GPU with enough VRAM)
- Discord webhook URL for the target channel
Setup
# 1. Install Python deps
pip install torch torchaudio onnxruntime numpy
# 2. Download Parakeet model
huggingface-cli download nvidia/parakeet-tdt-0.6b-v2 --local C:\Users\TJ\models\parakeet-tdt-0.6b-v2
# 3. List audio devices
ffmpeg -list_devices true -f dshow -i dummy
# 4. Run
set DISCORD_WEBHOOK=https://discord.com/api/webhooks/...
set AUDIO_DEVICE="your mic name"
set DB_THRESHOLD=40
set PARAKEET_MODEL=C:\Users\TJ\models\parakeet-tdt-0.6b-v2
python speechwatch.py
Environment Variables
| Variable | Default | Description |
|---|---|---|
DISCORD_WEBHOOK |
(required) | Discord webhook URL |
AUDIO_DEVICE |
default |
Mic name (from ffmpeg -list_devices) |
DB_THRESHOLD |
40 |
dB level to trigger recording |
RECORD_SECONDS |
15 |
Seconds to capture after trigger |
POST_ONLY_ABOVE |
50 |
Minimum confidence % to post |
PARAKEET_MODEL |
C:\Users\TJ\models\parakeet-tdt-0.6b-v2 |
Model path |
GPU_DEVICE |
cuda:0 |
GPU device |
How It Works
mic (always listening)
→ every 0.5s: check dB level
→ if above threshold:
→ capture 15s WAV
→ run through Parakeet ONNX on GPU
→ post text to Discord webhook
→ cooldown 10s
→ else: keep rolling
No VAD library. No cloud. Raw dB threshold.
Output
Messages posted to Discord look like:
🎙 [14:32:05] Hello, is anyone there?
87% confidence
State is persisted to %LOCALAPPDATA%\SpeechWatch\state.json — transcripts today, last transcription, timestamp.
Architecture
speechwatch.py— main entry point, Windows audio capture, Parakeet ONNX inference, Discord webhookspeechwatch.sh— Linux version (for OpenClaw host with ALSA)bot.ts— TypeScript Discord bot (alternative to webhook, for bot account approach)README.md— this file