UNDER THE HOOD
How Signify Works
From your hand to the screen in milliseconds โ here's what's happening behind the scenes.
Camera Capture
Your browser requests webcam access via the getUserMedia API. Video frames are streamed live into a <video> element at up to 30fps โ no data ever leaves your device.
MediaPipe Gesture Recognition
Each frame is processed by MediaPipe's Gesture Recognizer (v0.10.3) running entirely in your browser via WebAssembly. It detects 21 hand landmarks per hand and classifies gestures in real time using a pre-trained TFLite model.
Geometric Word Classifier
On top of MediaPipe's built-in gestures, we built a custom landmark geometry classifier. It normalises the 21 hand landmarks relative to the wrist, measures finger extension ratios and inter-finger distances, and maps specific hand shapes to ASL words like hello, water, please, drink, and more.
Vote Buffer & Hold Detection
To avoid flickering, detections are passed through a 10-frame vote buffer. A word is only considered "detected" when it appears in โฅ60% of recent frames. You then hold the sign for 500ms to commit it โ preventing accidental triggers mid-gesture.
Sentence Building & History
Committed words are appended into a live sentence displayed on screen. Every word is saved to localStorage with a frequency count, powering the "Frequent Signs" panel and the full History page โ so you and your family can track and learn the most used signs over time.
Speech to Text Mode
Switch modes and Signify uses the browser's built-in Web Speech API with continuous listening and interim results โ so spoken words appear on screen in real time as you talk, with no external service or API key required.