16:15, 10 May 2026

New Russian Benchmark Puts AI Protein Design Models Under the Microscope

Researchers from AIRI, HSE University and Constructor University have introduced GeomMotif, a benchmark designed to evaluate how accurately AI models handle the geometry of protein structures. The framework includes 57 tasks and helps identify exactly where models begin making errors. The work was presented at the ICLR conference in Rio de Janeiro.

As strange as it may sound, artificial intelligence has learned to improvise in the language of biology. Neural networks are already proposing entirely new protein structures, raising expectations for breakthroughs in drug discovery and enzyme engineering. But the imaginative output of an algorithm often collides with the constraints of physical reality. Researchers from AIRI Institute, HSE University and Constructor University have now introduced a tool designed to hold generative models accountable for those “hallucinations.” Their benchmark, GeomMotif, suggests that many systems celebrated by earlier tests still struggle when evaluated against strict geometric precision.

The benchmark is designed to test whether a model can reconstruct a protein around a predefined structural fragment while preserving its exact 3D geometry. Until now, no dedicated tool existed for that type of evaluation. In GeomMotif, fragments are selected according to geometric and physicochemical properties rather than biological function alone. That distinction matters because even a deviation of one angstrom – a unit used to measure atomic-scale distances and molecular structures – can sharply reduce the likelihood of successful protein design.

A Test for Geometric Precision

Until recently, evaluating AI systems for protein design often resembled judging an architect by polished renderings rather than by engineering blueprints. Most existing benchmarks focused primarily on biological function: if a model generated something roughly resembling the required fragment, the test was considered successful. Structural biology, however, operates at the scale of angstroms, where small geometric discrepancies can determine whether a molecule works at all.

The developers of GeomMotif shifted the emphasis toward strict geometry and physicochemical fidelity. At its core, the benchmark measures whether an AI system can extend a protein structure around a target fragment without distorting its precise three-dimensional shape. Experts interviewed by CNews note that even a one-angstrom deviation can critically reduce the probability of experimental success. In laboratory conditions, a visually convincing protein with flawed geometry may fail to bind to its target molecule entirely, rendering months of research useless.

Photo - New Russian Benchmark Puts AI Protein Design Models Under the Microscope

The Illusion of Perfection

The benchmark results proved sobering. Researchers evaluated ten systems across two categories of models. On conventional, less demanding benchmarks, leading systems had previously delivered near-perfect scores. GeomMotif exposed a different picture. The strongest model tested achieved only 40 points out of 100.

That result is less a setback than a clear signal of where current models still fall short. It suggests that today’s AI systems for protein design still face a major limitation: they are effective at generating plausible structures but far less reliable at controlling exact spatial constraints. GeomMotif functions not as a generative system, but as a rigorous auditing mechanism capable of exposing the blind spots of current algorithms. Without tools of that kind, progress in AI for Science risks stalling at the stage of elegant but experimentally unusable prototypes.

From the Nobel Prize to AIRI

The emergence of GeomMotif fits naturally into the broader trajectory of recent years. AI-driven biology entered the scientific mainstream after the 2024 Nobel Prize in Chemistry was awarded for advances in protein prediction and design. Following AlphaFold 3, which demonstrated the ability to model complex interactions involving DNA and small molecules, the field began moving toward even greater structural precision.

Russia’s role in that race is increasingly defined not as a consumer of external platforms, but as a developer of scientific infrastructure. Domestic researchers have been steadily expanding their capabilities in the field. AIRI scientists previously modified AlphaFold2 to predict the effects of mutations, a capability considered critical for disease research. In 2025, AIRI and Constructor University introduced DiMA, a model capable of generating proteins with predefined characteristics. GeomMotif represents the logical next step in that progression: building or adapting a model is no longer enough without having independent tools to validate the results.

Sovereignty Through Precision

GeomMotif is not a consumer-facing product, but a piece of scientific and technological infrastructure. Over the medium term, however, tools of this kind could accelerate the development of viable drugs by filtering out weak candidates before they reach expensive experimental stages, reducing costs for pharmaceutical companies.

For Russia, the project carries significance beyond basic research. It also touches on technological sovereignty in pharmaceuticals and biotechnology. The more accurate domestic AI models and evaluation systems become, the less dependent the country remains on foreign platforms. The export potential here lies primarily in scientific methodologies, datasets and technical expertise. Competition, however, remains intense. The United States, Europe and China are all developing their own standards and validation ecosystems. For GeomMotif to become a globally recognized benchmark, the project will likely require open-source implementation and validation by independent scientific groups.

The Future Will Be Hybrid

The trajectory for the coming years is becoming increasingly clear: the era of unquestioned trust in neural networks is ending. More specialized benchmarks are likely to emerge, designed to evaluate not only biological function but also structural robustness and reproducibility. AI-driven protein design is expected to become standard infrastructure in research and development, but it will not replace laboratory science. Even the most advanced model still requires experimental validation and regulatory review.

GeomMotif demonstrates that Russia’s scientific sector is capable of building sophisticated tools for evaluating AI systems at an international level. That marks an important step toward turning artificial intelligence in biology from a generator of ideas into a dependable engineering instrument.

In the future, we plan to accelerate the result verification system and expand the benchmark to support new classes of generative protein models. Our primary focus is on all-atom models that account for every atom in a protein structure rather than only its backbone

Pavel Strashnov

Lead Research Scientist in the Protein Design Group at AIRI’s AI Center for Drug Discovery

Science and new technologies