10:51, 10 April 2026

How a New Russian Model Is Redefining the Rules of Bioinformatics

Researchers at the Institute of Artificial Intelligence and Digital Sciences at the Faculty of Computer Science, HSE University (Vysshaya shkola ekonomiki, Higher School of Economics), have developed a model that predicts protein–protein interactions with up to 95% accuracy. GSMFormer-PPI integrates three types of protein data, including surface characteristics, and analyzes the relationships between them rather than simply combining features, as earlier models did. The approach could accelerate the discovery of disease mechanisms, biomarkers, and drug targets. The findings were published in Scientific Reports.

Proteins are the cell’s primary operators. They transmit signals, trigger reactions, and assemble into complexes. When those interactions break down, cellular processes fail. That is how many diseases begin. Identifying which proteins interact – and which do not – remains a central problem in modern biology.

Testing all possible protein pairs in the lab is both time-consuming and expensive. Even a dataset of 100 proteins produces thousands of combinations. This is where artificial intelligence becomes essential. It learns to predict interactions by analyzing molecular data.

A Model That Sees the Whole Protein

The GSMFormer-PPI system developed at HSE examines each protein from three complementary perspectives. First, it analyzes the amino acid sequence – the “text” that defines the protein – using a language model. Second, it processes the protein’s 3D structure, capturing how that sequence folds in space, via a graph neural network. Third, it evaluates the molecular surface, where proteins recognize and bind to one another. Traditionally, these data types were simply merged into a single feature set. GSMFormer-PPI takes a different approach: instead of combining inputs mechanically, it identifies relationships between them.

In effect, the model learns to represent proteins as integrated systems and predicts their interactions with 95.7% accuracy.

Photo - How a New Russian Model Is Redefining the Rules of Bioinformatics

Why It Matters

This is not a “drug on the shelf tomorrow” story. But advances like this reshape how biomedical research progresses. They accelerate the identification of biomarkers, clarify disease mechanisms, and help prioritize targets for future therapies. For patients, this points to more precise diagnostics and more personalized treatment. For researchers, it reduces costly experimental cycles. AI filters hypotheses, allowing scientists to focus on the most promising ones.

Russia in the Global AI-for-Biology Race

The HSE model fits into a broader global trajectory. In 2024, the Nobel Prize in Chemistry recognized advances in protein structure prediction, while Google DeepMind introduced AlphaFold 3. The Russian system does not replicate these approaches. Instead, it proposes a distinct architecture, an important factor for technological sovereignty. The project was carried out within a state-supported AI research center.

From Publication to Practice

The next phase unfolds along two tracks. Internationally, publication in Scientific Reports opens the door to collaboration and citation. Domestically, the focus is on integrating the model into pharmaceutical R&D pipelines and academic workflows. Russia is already building comprehensive AI tools in this field. For example, ChemCoScientist (ITMO, 2026) automates the full computational molecule discovery cycle. GSMFormer-PPI could become a key component within this emerging ecosystem.

What Comes Next

The model is not a finished medical product. It is a research tool. In the near term, further architectural refinement, benchmarking, and pilot deployments are expected. Over time, GSMFormer-PPI may be incorporated into domestic drug discovery platforms.

Russian AI research is steadily advancing toward complex, high-stakes problems at a global level. Here, the cost of error is high, and the payoff is measured in faster biomedical discovery.

Protein surface properties are critical for interactions, as they determine how molecules recognize one another and concentrate the physicochemical features that govern binding. In our model, we sought to incorporate this information alongside sequence and three-dimensional structure, and then move beyond simple feature aggregation by enabling the algorithm to analyze relationships between them. This is what allowed us to improve the accuracy of protein–protein interaction prediction

Maria Poptsova

Director of the Center for Biomedical Research and Technologies, Institute of Artificial Intelligence and Digital Sciences, HSE University

Science and new technologies