Back to Home

AI Vision System

Pose Detection

Real-time skeletal tracking powered by MediaPipe BlazePose. Track 33 body landmarks, validate exercise form, and count reps automatically with sub-frame precision.

33

Body Landmarks

Full skeletal tracking

22+

Exercises

Supported movements

30+

FPS

Real-time processing

<16ms

Latency

Frame processing time

Core Capabilities

Automatic Rep Counting

State machine-based rep detection that identifies exercise phases (up/down, contracted/extended) and counts reps with high accuracy.

  • Phase detection algorithm
  • Noise filtering
  • Multi-exercise support

Real-Time Form Validation

Continuous analysis of joint angles and body positioning to ensure proper exercise form and prevent injury.

  • Angle threshold checking
  • Posture analysis
  • Instant feedback

Color-Coded Angle Display

Visual feedback system that shows joint angles in real-time with color coding (green/yellow/red) for form quality.

  • Dynamic color mapping
  • Target angle ranges
  • Visual overlays

One Euro Filter Smoothing

Advanced signal processing to eliminate jitter while maintaining responsiveness for smooth landmark tracking.

  • Adaptive filtering
  • Low latency
  • High precision

Supported Exercises

Our pose detection system supports a wide range of exercises across different muscle groups and movement patterns.

Push-ups
Squats
Lunges
Planks
Burpees
Jumping Jacks
Mountain Climbers
Deadlifts
Bicep Curls
Shoulder Press
Tricep Dips
Leg Raises
Crunches
Russian Twists
High Knees
Box Jumps
Pull-ups
Rows
Lateral Raises
Calf Raises
Hip Thrusts
Glute Bridges

Technical Implementation

A deep look at the ML model, inference pipeline, and signal processing that power Aerovit's real-time exercise tracking.

MediaPipe BlazePose

BlazePose is a lightweight, on-device pose estimation model developed by Google Research. It uses a two-step detector-tracker architecture: a fast person detector localises the body in the first frame, then a landmark regression network tracks 33 keypoints across subsequent frames without re-running detection — keeping latency under 16 ms on modern mobile GPUs.

The model outputs 3D coordinates (x, y, z) plus a per-landmark visibility score, enabling depth-aware angle calculations even from a single monocular camera. BlazePose Heavy (the variant Aerovit uses) maximises landmark accuracy at the cost of slightly higher compute, which is an acceptable trade-off on modern phones.

Because inference runs entirely on-device via GPU delegates (TFLite on Android, CoreML on iOS), no camera frames ever leave the user's phone — ensuring full privacy and zero-latency operation regardless of network conditions.

Model Specifications

  • 33 body landmarks per frame

    Full-body skeleton including face, hands, and feet

  • 3D coordinates (x, y, z) + visibility

    Depth estimation from monocular camera input

  • Two-step detector → tracker pipeline

    Detect once, track continuously for speed

  • BlazePose Heavy variant

    Higher accuracy model optimised for fitness use

  • On-device GPU-accelerated inference

    TFLite GPU delegate / CoreML — no cloud needed

  • Front & back camera support

    Automatic coordinate mirroring for selfie mode

Processing Pipeline

1

Frame Capture

Camera streams NV21/BGRA frames at 30 FPS via CameraX / AVFoundation

2

Pose Estimation

BlazePose Heavy extracts 33 landmarks with 3D coordinates per frame

3

EMA Smoothing

Exponential moving average (α = 0.45) stabilises landmark positions across frames

4

Angle Calculation

Joint angles computed via 3-point inverse tangent on key landmark triplets

5

State Machine

Finite state machine detects exercise phases (up ↔ down) and increments reps

6

UI Feedback

Skeleton overlay, colour-coded angles, audio cues, and rep counter update

Signal Processing

EMA Landmark Smoother

Raw ML Kit landmarks jitter frame-to-frame. We apply an exponential moving average (α = 0.45) independently to each landmark's x and y coordinates, producing a visually smooth skeleton while keeping responsiveness high enough for fast exercises.

Frame Throttling

The camera streams at 30 FPS, but running BlazePose Heavy on every frame is unnecessary. A frame throttle limits inference to ~15 FPS, halving GPU load with no perceptible difference in tracking quality — extending battery life during long workout sessions.

BoxFit.cover Coordinate Mapping

ML Kit returns landmarks in the raw camera sensor coordinate space. We compute the correct scale factor and crop offset for the BoxFit.cover display mode, accounting for sensor rotation and front-camera mirroring, so the skeleton aligns pixel-perfectly with the user's body on screen.

Fully Implemented

This feature is live and functional in the app

Explore Demo