How a New Russian Model Is Redefining the Rules of Bioinformatics
Researchers at the Institute of Artificial Intelligence and Digital Sciences at the Faculty of Computer Science, HSE University (Vysshaya shkola ekonomiki, Higher School of Economics), have developed a model that predicts protein–protein interactions with up to 95% accuracy. GSMFormer-PPI integrates three types of protein data, including surface characteristics, and analyzes the relationships between them rather than simply combining features, as earlier models did. The approach could accelerate the discovery of disease mechanisms, biomarkers, and drug targets. The findings were published in Scientific Reports.

Proteins are the cell’s primary operators. They transmit signals, trigger reactions, and assemble into complexes. When those interactions break down, cellular processes fail. That is how many diseases begin. Identifying which proteins interact – and which do not – remains a central problem in modern biology.
Testing all possible protein pairs in the lab is both time-consuming and expensive. Even a dataset of 100 proteins produces thousands of combinations. This is where artificial intelligence becomes essential. It learns to predict interactions by analyzing molecular data.
A Model That Sees the Whole Protein
The GSMFormer-PPI system developed at HSE examines each protein from three complementary perspectives. First, it analyzes the amino acid sequence – the “text” that defines the protein – using a language model. Second, it processes the protein’s 3D structure, capturing how that sequence folds in space, via a graph neural network. Third, it evaluates the molecular surface, where proteins recognize and bind to one another. Traditionally, these data types were simply merged into a single feature set. GSMFormer-PPI takes a different approach: instead of combining inputs mechanically, it identifies relationships between them.
In effect, the model learns to represent proteins as integrated systems and predicts their interactions with 95.7% accuracy.

Why It Matters
This is not a “drug on the shelf tomorrow” story. But advances like this reshape how biomedical research progresses. They accelerate the identification of biomarkers, clarify disease mechanisms, and help prioritize targets for future therapies. For patients, this points to more precise diagnostics and more personalized treatment. For researchers, it reduces costly experimental cycles. AI filters hypotheses, allowing scientists to focus on the most promising ones.
Russia in the Global AI-for-Biology Race
The HSE model fits into a broader global trajectory. In 2024, the Nobel Prize in Chemistry recognized advances in protein structure prediction, while Google DeepMind introduced AlphaFold 3. The Russian system does not replicate these approaches. Instead, it proposes a distinct architecture, an important factor for technological sovereignty. The project was carried out within a state-supported AI research center.

From Publication to Practice
The next phase unfolds along two tracks. Internationally, publication in Scientific Reports opens the door to collaboration and citation. Domestically, the focus is on integrating the model into pharmaceutical R&D pipelines and academic workflows. Russia is already building comprehensive AI tools in this field. For example, ChemCoScientist (ITMO, 2026) automates the full computational molecule discovery cycle. GSMFormer-PPI could become a key component within this emerging ecosystem.
What Comes Next
The model is not a finished medical product. It is a research tool. In the near term, further architectural refinement, benchmarking, and pilot deployments are expected. Over time, GSMFormer-PPI may be incorporated into domestic drug discovery platforms.

Russian AI research is steadily advancing toward complex, high-stakes problems at a global level. Here, the cost of error is high, and the payoff is measured in faster biomedical discovery.









































