Russian Scientists Create a Benchmark to Test AI’s Chemistry Skills
The new tool measures how well neural networks can handle real-world chemical problems—from reaction analysis to drug design.

Researchers from AIRI (Artificial Intelligence Research Institute) in Russia have developed the country’s first benchmark for evaluating chemical language models—AI systems trained to think and reason like professional chemists.
The new framework simulates the multi-step logic of real chemical research, such as catalyst development or drug discovery. Unlike standard tests that only check factual recall, this benchmark challenges AI models to build logical chains—from predicting a reaction to analyzing a compound’s biological activity.
Bridging Universal and Specialized AI
To test the system, the team compared general-purpose large language models with specialized chemistry-trained AIs. The results were striking: domain-specific models performed better on technical accuracy, while general models showed stronger reasoning and adaptability. Researchers say this could lead to the development of hybrid systems that combine both strengths.
According to Kuzma Khrabrov, a research scientist at AIRI, “the benchmark will serve as a tool for advancing AI’s ability to automate chemical reasoning and accelerate drug discovery.”
The project underscores how artificial intelligence is increasingly becoming a collaborator in fundamental science, merging human expertise with computational precision to push the boundaries of chemical innovation.








































