Vision AI & Perception — SYNAPEX PROTOCOL

The Challenge

Seeing Is Not Understanding

A camera captures pixels. A depth sensor measures distances. But embodied intelligence needs more than raw data — it needs to understand what it sees: where objects are in 3D space, what they are, how they relate to each other, and what they mean for the task at hand.

The SYNAPEX perception system fuses multiple sensor modalities into a unified scene representation that the brain modules can reason about, plan with, and act on. This is not just computer vision — it is the perceptual foundation of autonomous existence.

Our approach is fundamentally different from single-model vision systems. Perception is decomposed into specialised stages, each producing structured output that feeds the next — creating a pipeline that is interpretable, modular, and debuggable.

👁

Multi-modal perception active

Perception Pipeline

From Raw Sensors to Scene Understanding

Seven layers of processing, each adding semantic richness to the raw input.

01

Sensor Fusion

RGB cameras + depth sensors + proprioceptive data aligned into a unified spatio-temporal frame. Time-synchronized multi-modal input.

02

Feature Extraction

Backbone network produces dense feature maps. Multi-scale representation capturing both fine detail and global context. Pre-trained on massive datasets, fine-tuned for embodied tasks.

03

Object Detection & Segmentation

Instance-level detection and pixel-precise segmentation. Every object identified, located in 3D, and tracked across frames. Real-time panoptic segmentation.

04

Depth Estimation & 3D Reconstruction

Dense depth maps refined with monocular depth estimation. Point cloud generation. Local 3D mesh reconstruction for manipulation targets.

05

Spatial Relationship Mapping

Scene graph construction: which objects are near which, what supports what, what occludes what. Semantic spatial reasoning for navigation and manipulation.

06

Context & Intent Estimation

What is happening in this scene? Activity recognition, human pose estimation, gesture detection. Predicting where things are going, not just where they are.

07

Structured Scene Representation

Final output: a machine-readable scene graph with 3D positions, object identities, relationships, and context. Ready for the Reasoning and Planning modules of the AI brain.

Sensor Modalities

Multi-Modal Awareness

📷

RGB Vision

High-resolution stereo cameras for colour, texture, and pattern recognition. The primary source of semantic information about the external world.

🛰

Depth Sensing

Structured-light and time-of-flight sensors for precise distance measurement. Creates a 3D point cloud of the environment at every frame.

📡

Proprioception

Internal body state from the muscle system: joint positions, forces, velocities. Fused with external sensors for a complete understanding of the body in its environment.

🎙

Audio Processing

Directional microphone array for sound localisation, speech recognition, and environmental audio classification. Hearing adds context that vision alone cannot provide.

🌡

Thermal Sensing

Infrared thermal imaging for temperature awareness, human detection in low light, and material identification. Critical for safe interaction with the natural world.

🛰

IMU & Vestibular

Inertial measurement for balance, acceleration, and orientation. The vestibular analog for embodied systems — essential for locomotion and dynamic stability.

Lab Integration

Vision Modules in the Laboratory

Each stage of the perception pipeline is published as a separate module in the SYNAPEX Lab. Researchers can use the full pipeline or pick individual modules — face recognition, object detection, depth estimation — for their own projects. Every module earns $SYNX for its creator.

Phase 1 🕵

Face Detection & Recognition

Real-time multi-face detection and identification. Extracted from the perception pipeline as a standalone deployable module.

CVBiometricEdge

In Dev 🧹

3D Face Scan

RGB-D face scanning with mesh generation. Landmark extraction, expression tracking. Publishable for avatar, health, and security applications.

3DDepthMesh

In Dev 👁

Scene Segmentation

Panoptic segmentation: every pixel classified, every instance segmented. The foundation for spatial reasoning in embodied AI.

PanopticSemanticInstance

See All Modules Next: Modular Mind →