No description
  • TypeScript 35.5%
  • Python 33.2%
  • Shell 31.3%
Find a file
2026-03-28 15:35:40 -04:00
.gitignore remove pycache, add to gitignore 2026-03-28 15:27:00 -04:00
bot.ts SpeechWatch: initial TypeScript bot + types + config 2026-03-28 15:21:02 -04:00
config.ts SpeechWatch: initial TypeScript bot + types + config 2026-03-28 15:21:02 -04:00
README.md Add Windows Python version with DirectShow audio capture 2026-03-28 15:26:53 -04:00
speechwatch.py Add Windows Python version with DirectShow audio capture 2026-03-28 15:26:53 -04:00
speechwatch.sh GPU STT working: RTX 5060 Ti + CUDA 12.9 + int8_float16 2026-03-28 15:35:40 -04:00
types.ts SpeechWatch: initial TypeScript bot + types + config 2026-03-28 15:21:02 -04:00

SpeechWatch

Ambient audio monitoring → Parakeet STT → Discord transcription.

For Thomas's actual Windows PC. No cloud. No API. Just listen and transcribe.

Requirements

  • Windows with ffmpeg in PATH
  • Python 3.8+ with: torch, torchaudio, onnxruntime, numpy
  • NVIDIA Parakeet-TDT 0.6B v2 — download with huggingface-cli download nvidia/parakeet-tdt-0.6b-v2
  • RTX 5060 Ti (or any CUDA GPU with enough VRAM)
  • Discord webhook URL for the target channel

Setup

# 1. Install Python deps
pip install torch torchaudio onnxruntime numpy

# 2. Download Parakeet model
huggingface-cli download nvidia/parakeet-tdt-0.6b-v2 --local C:\Users\TJ\models\parakeet-tdt-0.6b-v2

# 3. List audio devices
ffmpeg -list_devices true -f dshow -i dummy

# 4. Run
set DISCORD_WEBHOOK=https://discord.com/api/webhooks/...
set AUDIO_DEVICE="your mic name"
set DB_THRESHOLD=40
set PARAKEET_MODEL=C:\Users\TJ\models\parakeet-tdt-0.6b-v2
python speechwatch.py

Environment Variables

Variable Default Description
DISCORD_WEBHOOK (required) Discord webhook URL
AUDIO_DEVICE default Mic name (from ffmpeg -list_devices)
DB_THRESHOLD 40 dB level to trigger recording
RECORD_SECONDS 15 Seconds to capture after trigger
POST_ONLY_ABOVE 50 Minimum confidence % to post
PARAKEET_MODEL C:\Users\TJ\models\parakeet-tdt-0.6b-v2 Model path
GPU_DEVICE cuda:0 GPU device

How It Works

mic (always listening)
 → every 0.5s: check dB level
 → if above threshold:
     → capture 15s WAV
     → run through Parakeet ONNX on GPU
     → post text to Discord webhook
     → cooldown 10s
 → else: keep rolling

No VAD library. No cloud. Raw dB threshold.

Output

Messages posted to Discord look like:

🎙 [14:32:05] Hello, is anyone there?
   87% confidence

State is persisted to %LOCALAPPDATA%\SpeechWatch\state.json — transcripts today, last transcription, timestamp.

Architecture

  • speechwatch.py — main entry point, Windows audio capture, Parakeet ONNX inference, Discord webhook
  • speechwatch.sh — Linux version (for OpenClaw host with ALSA)
  • bot.ts — TypeScript Discord bot (alternative to webhook, for bot account approach)
  • README.md — this file