01
๐Ÿ“ท

Camera Capture

Your browser requests webcam access via the getUserMedia API. Video frames are streamed live into a <video> element at up to 30fps โ€” no data ever leaves your device.

02
๐Ÿค–

MediaPipe Gesture Recognition

Each frame is processed by MediaPipe's Gesture Recognizer (v0.10.3) running entirely in your browser via WebAssembly. It detects 21 hand landmarks per hand and classifies gestures in real time using a pre-trained TFLite model.

WASM TFLite GPU / XNNPACK LIVE_STREAM mode
03
๐Ÿ“

Geometric Word Classifier

On top of MediaPipe's built-in gestures, we built a custom landmark geometry classifier. It normalises the 21 hand landmarks relative to the wrist, measures finger extension ratios and inter-finger distances, and maps specific hand shapes to ASL words like hello, water, please, drink, and more.

04
๐Ÿ—ณ๏ธ

Vote Buffer & Hold Detection

To avoid flickering, detections are passed through a 10-frame vote buffer. A word is only considered "detected" when it appears in โ‰ฅ60% of recent frames. You then hold the sign for 500ms to commit it โ€” preventing accidental triggers mid-gesture.

60% threshold 500ms hold 800ms cooldown
05
๐Ÿ“

Sentence Building & History

Committed words are appended into a live sentence displayed on screen. Every word is saved to localStorage with a frequency count, powering the "Frequent Signs" panel and the full History page โ€” so you and your family can track and learn the most used signs over time.

06
๐ŸŽ™๏ธ

Speech to Text Mode

Switch modes and Signify uses the browser's built-in Web Speech API with continuous listening and interim results โ€” so spoken words appear on screen in real time as you talk, with no external service or API key required.

Web Speech API Continuous mode 100% in-browser
Try It Yourself โ†’ โ† Back to Home