AI Vision System

Pose Detection

Real-time skeletal tracking powered by MediaPipe BlazePose. Track 33 body landmarks, validate exercise form, and count reps automatically with sub-frame precision.

BlazePose real-time exercise tracking — squats and push-ups demo

Source: Google Research — BlazePose

Body Landmarks

Full skeletal tracking

22+

Exercises

Supported movements

30+

FPS

Real-time processing

<16ms

Latency

Frame processing time

Core Capabilities

Automatic Rep Counting

State machine-based rep detection that identifies exercise phases (up/down, contracted/extended) and counts reps with high accuracy.

Phase detection algorithm
Noise filtering
Multi-exercise support

Real-Time Form Validation

Continuous analysis of joint angles and body positioning to ensure proper exercise form and prevent injury.

Angle threshold checking
Posture analysis
Instant feedback

Color-Coded Angle Display

Visual feedback system that shows joint angles in real-time with color coding (green/yellow/red) for form quality.

Dynamic color mapping
Target angle ranges
Visual overlays

One Euro Filter Smoothing

Advanced signal processing to eliminate jitter while maintaining responsiveness for smooth landmark tracking.

Adaptive filtering
Low latency
High precision

Supported Exercises

Our pose detection system supports a wide range of exercises across different muscle groups and movement patterns.

Push-ups

Squats

Lunges

Planks

Burpees

Jumping Jacks

Mountain Climbers

Deadlifts

Bicep Curls

Shoulder Press

Tricep Dips

Leg Raises

Crunches

Russian Twists

High Knees

Box Jumps

Pull-ups

Rows

Lateral Raises

Calf Raises

Hip Thrusts

Glute Bridges

Technical Implementation

A deep look at the ML model, inference pipeline, and signal processing that power Aerovit's real-time exercise tracking.

MediaPipe BlazePose

BlazePose is a lightweight, on-device pose estimation model developed by Google Research. It uses a two-step detector-tracker architecture: a fast person detector localises the body in the first frame, then a landmark regression network tracks 33 keypoints across subsequent frames without re-running detection — keeping latency under 16 ms on modern mobile GPUs.

The model outputs 3D coordinates (x, y, z) plus a per-landmark visibility score, enabling depth-aware angle calculations even from a single monocular camera. BlazePose Heavy (the variant Aerovit uses) maximises landmark accuracy at the cost of slightly higher compute, which is an acceptable trade-off on modern phones.

Because inference runs entirely on-device via GPU delegates (TFLite on Android, CoreML on iOS), no camera frames ever leave the user's phone — ensuring full privacy and zero-latency operation regardless of network conditions.

Model Specifications

33 body landmarks per frame
Full-body skeleton including face, hands, and feet
3D coordinates (x, y, z) + visibility
Depth estimation from monocular camera input
Two-step detector → tracker pipeline
Detect once, track continuously for speed
BlazePose Heavy variant
Higher accuracy model optimised for fitness use
On-device GPU-accelerated inference
TFLite GPU delegate / CoreML — no cloud needed
Front & back camera support
Automatic coordinate mirroring for selfie mode

Processing Pipeline

Frame Capture

Camera streams NV21/BGRA frames at 30 FPS via CameraX / AVFoundation

Pose Estimation

BlazePose Heavy extracts 33 landmarks with 3D coordinates per frame

EMA Smoothing

Exponential moving average (α = 0.45) stabilises landmark positions across frames

Angle Calculation

Joint angles computed via 3-point inverse tangent on key landmark triplets

State Machine

Finite state machine detects exercise phases (up ↔ down) and increments reps

UI Feedback

Skeleton overlay, colour-coded angles, audio cues, and rep counter update

Signal Processing

EMA Landmark Smoother

Raw ML Kit landmarks jitter frame-to-frame. We apply an exponential moving average (α = 0.45) independently to each landmark's x and y coordinates, producing a visually smooth skeleton while keeping responsiveness high enough for fast exercises.

Frame Throttling

The camera streams at 30 FPS, but running BlazePose Heavy on every frame is unnecessary. A frame throttle limits inference to ~15 FPS, halving GPU load with no perceptible difference in tracking quality — extending battery life during long workout sessions.

BoxFit.cover Coordinate Mapping

ML Kit returns landmarks in the raw camera sensor coordinate space. We compute the correct scale factor and crop offset for the BoxFit.cover display mode, accounting for sensor rotation and front-camera mirroring, so the skeleton aligns pixel-perfectly with the user's body on screen.

Fully Implemented

This feature is live and functional in the app

Explore Demo