12:21, 02 April 2026

Vision Without Lidar: How a Russian Breakthrough Is Redefining Robotics Perception

An international research team with participation from MIPT has introduced the stereo vision technology Un-ViTAStereo, designed to estimate distances without expensive lidar sensors or manual depth labeling by using the Depth Anything V2 model as a supervisory signal.

The approach aims to make performance in complex scenes more robust, including low-texture environments, repetitive patterns, occlusions and object boundaries where conventional methods often struggle.

The global market for 3D machine vision is undergoing a structural shift driven by technological progress and rising demand across industries. Adoption is expanding in manufacturing, agriculture, automotive and healthcare, where 3D vision supports quality control, process automation and operational efficiency.

This transition reflects a growing need for high precision on production lines, where traditional 2D systems are no longer sufficient. As organizations seek to optimize workflows, 3D vision is becoming more widely integrated, suggesting sustained market growth.

Photo - Vision Without Lidar: How a Russian Breakthrough Is Redefining Robotics Perception

Replacing Hardware With Algorithms

Researchers from MIPT developed the stereo vision system Un-ViTAStereo. It estimates distances without relying on costly lidar hardware or manually labeled datasets. The system remains effective in challenging conditions where traditional algorithms fail, such as smooth surfaces, dense vegetation or fog. The technology is expected to support autonomous vehicles and robots by improving navigation accuracy and safety.

Each second, the human brain combines two slightly different images from the left and right eyes to construct a 3D map of the world. Stereo systems in robots and self-driving vehicles follow the same principle, using cameras instead of eyes and algorithms instead of the brain. However, this process breaks down in certain cases, for example when facing a uniform white wall or repetitive textures, where matching images becomes unreliable and leads to errors.

The new training method addresses these limitations. Researchers introduced the Depth Anything V2 model as a supervisory component during training. This model estimates relative depth from a single image. While it does not measure distance in meters, it identifies cues such as shadows, perspective and occlusion, allowing it to determine which objects are closer or farther away. The stereo system then learns by selecting only those predictions consistent with this guidance, improving overall accuracy.

At its core, the technology strengthens a fundamental capability in machine perception for robots, drones, industrial automation and ADAS/AD systems. It reduces dependence on lidar and labor-intensive data labeling, lowering both development costs and deployment time across transport, logistics, agriculture and industrial robotics. This also strengthens positions in the AI and computer vision space, where total cost increasingly depends not only on hardware but also on training pipelines and dataset creation.

Future Outlook

From an export perspective, Un-ViTAStereo is valuable primarily as an algorithmic solution integrated into the global research ecosystem through its use of Depth Anything V2. It fits into the expanding stereo and depth perception segment for robotics and autonomous systems, offering energy-efficient and cost-effective alternatives to sensor-heavy architectures.

The most viable paths to international adoption include licensing modules, integration into open-source and industrial computer vision stacks, and participation in research consortia. Validation on the widely recognized KITTI benchmark further supports its credibility.

Domestically, the outlook is even clearer. The technology can be deployed across four key segments: autonomous transport and ADAS, industrial and warehouse robotics, agricultural machinery and robotics, and drones operating in complex environments. A critical advantage in the Russian context is reduced reliance on lidar and manual labeling, which lowers the cost of pilot projects and reduces infrastructure barriers. In this way, the development complements the emerging market for camera-based autonomy, already advanced by companies such as Cognitive Pilot.

A Sign of Success

In the near term, such technologies are unlikely to fully replace lidar. Instead, they will be used in hybrid architectures where reducing system cost or improving robustness in specific scenarios is critical. However, as foundation models and stereo pipelines continue to improve, camera-based solutions are expected to play a larger role.

For Russia, this creates an opportunity to export not hardware but intelligent perception modules, particularly in sectors such as specialized machinery, agtech and industrial robotics, where cost and reliability often outweigh the pursuit of maximum precision.

Un-ViTAStereo is more than another neural network. It signals that Russian research can operate at the forefront of a field that will shape the next generation of autonomous systems.

The system has already been tested on standard datasets. The result demonstrated clear outperformance of Un-ViTAStereo compared to competing approaches on benchmark evaluations. For example, in the KITTI 2015 autonomous driving test, the share of large errors was reduced to 5%. That translates into 23% fewer critical errors in estimating distances to objects such as curbs or pedestrians during motion

Alexander Dvorkovich

Head of Project, Scientific and Technical Center for Telecommunications, MIPT

Science and new technologies