No description

TypeScript 35.5%
Python 33.2%
Shell 31.3%

Find a file

John Stamagal 001393f767 GPU STT working: RTX 5060 Ti + CUDA 12.9 + int8_float16		2026-03-28 15:35:40 -04:00
.gitignore	remove pycache, add to gitignore	2026-03-28 15:27:00 -04:00
bot.ts	SpeechWatch: initial TypeScript bot + types + config	2026-03-28 15:21:02 -04:00
config.ts	SpeechWatch: initial TypeScript bot + types + config	2026-03-28 15:21:02 -04:00
README.md	Add Windows Python version with DirectShow audio capture	2026-03-28 15:26:53 -04:00
speechwatch.py	Add Windows Python version with DirectShow audio capture	2026-03-28 15:26:53 -04:00
speechwatch.sh	GPU STT working: RTX 5060 Ti + CUDA 12.9 + int8_float16	2026-03-28 15:35:40 -04:00
types.ts	SpeechWatch: initial TypeScript bot + types + config	2026-03-28 15:21:02 -04:00

README.md

SpeechWatch

Ambient audio monitoring → Parakeet STT → Discord transcription.

For Thomas's actual Windows PC. No cloud. No API. Just listen and transcribe.

Requirements

Windows with ffmpeg in PATH
Python 3.8+ with: torch, torchaudio, onnxruntime, numpy
NVIDIA Parakeet-TDT 0.6B v2 — download with huggingface-cli download nvidia/parakeet-tdt-0.6b-v2
RTX 5060 Ti (or any CUDA GPU with enough VRAM)
Discord webhook URL for the target channel

Setup

# 1. Install Python deps
pip install torch torchaudio onnxruntime numpy

# 2. Download Parakeet model
huggingface-cli download nvidia/parakeet-tdt-0.6b-v2 --local C:\Users\TJ\models\parakeet-tdt-0.6b-v2

# 3. List audio devices
ffmpeg -list_devices true -f dshow -i dummy

# 4. Run
set DISCORD_WEBHOOK=https://discord.com/api/webhooks/...
set AUDIO_DEVICE="your mic name"
set DB_THRESHOLD=40
set PARAKEET_MODEL=C:\Users\TJ\models\parakeet-tdt-0.6b-v2
python speechwatch.py

Environment Variables

Variable	Default	Description
`DISCORD_WEBHOOK`	(required)	Discord webhook URL
`AUDIO_DEVICE`	`default`	Mic name (from `ffmpeg -list_devices`)
`DB_THRESHOLD`	`40`	dB level to trigger recording
`RECORD_SECONDS`	`15`	Seconds to capture after trigger
`POST_ONLY_ABOVE`	`50`	Minimum confidence % to post
`PARAKEET_MODEL`	`C:\Users\TJ\models\parakeet-tdt-0.6b-v2`	Model path
`GPU_DEVICE`	`cuda:0`	GPU device

How It Works

mic (always listening)
 → every 0.5s: check dB level
 → if above threshold:
     → capture 15s WAV
     → run through Parakeet ONNX on GPU
     → post text to Discord webhook
     → cooldown 10s
 → else: keep rolling

No VAD library. No cloud. Raw dB threshold.

Output

Messages posted to Discord look like:

🎙 [14:32:05] Hello, is anyone there?
   87% confidence

State is persisted to %LOCALAPPDATA%\SpeechWatch\state.json — transcripts today, last transcription, timestamp.

Architecture

speechwatch.py — main entry point, Windows audio capture, Parakeet ONNX inference, Discord webhook
speechwatch.sh — Linux version (for OpenClaw host with ALSA)
bot.ts — TypeScript Discord bot (alternative to webhook, for bot account approach)
README.md — this file