Signify — How It Works

01

📷

Camera Capture

Your browser requests webcam access via the getUserMedia API. Video frames are streamed live into a <video> element at up to 30fps — no data ever leaves your device.

02

🤖

MediaPipe Gesture Recognition

Each frame is processed by MediaPipe's Gesture Recognizer (v0.10.3) running entirely in your browser via WebAssembly. It detects 21 hand landmarks per hand and classifies gestures in real time using a pre-trained TFLite model.

WASM TFLite GPU / XNNPACK LIVE_STREAM mode

03

📐

Geometric Word Classifier

On top of MediaPipe's built-in gestures, we built a custom landmark geometry classifier. It normalises the 21 hand landmarks relative to the wrist, measures finger extension ratios and inter-finger distances, and maps specific hand shapes to ASL words like hello, water, please, drink, and more.

04

🗳️

Vote Buffer & Hold Detection

To avoid flickering, detections are passed through a 10-frame vote buffer. A word is only considered "detected" when it appears in ≥60% of recent frames. You then hold the sign for 500ms to commit it — preventing accidental triggers mid-gesture.

60% threshold 500ms hold 800ms cooldown

05

📝

Sentence Building & History

Committed words are appended into a live sentence displayed on screen. Every word is saved to localStorage with a frequency count, powering the "Frequent Signs" panel and the full History page — so you and your family can track and learn the most used signs over time.

06

🎙️

Speech to Text Mode

Switch modes and Signify uses the browser's built-in Web Speech API with continuous listening and interim results — so spoken words appear on screen in real time as you talk, with no external service or API key required.

Web Speech API Continuous mode 100% in-browser