Artificial Intelligence in Russia Passes a Reading Exam
Russian researchers have introduced LIBRA, a new benchmark designed to test how well large language models can read, analyze, and interpret extensive texts — a crucial step toward building AI systems capable of truly deep understanding

Russia’s AI community has unveiled LIBRA (Long Input Benchmark for Russian Analysis), a unified testing framework that measures how effectively large language models (LLMs) process long-form content. The benchmark includes 18 scenarios designed to evaluate a model’s ability to understand materials ranging from 4,000 to 128,000 tokens — the scale of a long article to a short book.
A Joint Development Effort
LIBRA is the result of collaboration between SberAI, the Higher School of Economics, AIRI, and MIPT. The teams sought to create a standardized way for Russian researchers to compare models on tasks involving “long-context understanding,” an area where previous methodologies were fragmented and inconsistent.
Escalating Difficulty
The tasks in LIBRA are arranged by increasing complexity: from the classic “needle-in-a-haystack” search to logical reasoning, answering detailed questions, reconciling facts scattered across a document, and even solving mathematical or logic problems embedded in the text.
This structure makes it possible to evaluate whether a model can retain long-range context, extract critical details, and draw meaningful conclusions.
Initial tests of 17 widely used language models revealed that performance drops sharply as text length increases. Even the strongest systems struggle with marathon-style reading, while open-source models degrade even faster. GPT-4o ranked first overall, with GLM4-9B-Chat leading among open-source alternatives.
More Than Just a Test
The creators emphasize that LIBRA is not merely a benchmark but a transparent evaluation platform. Any developer can test a model, publish results, and help advance the field. Upcoming expansions include additional task types, more diverse text formats, and new analytical scenarios.
This progress is expected to help LLMs better process large volumes of complex information — a capability essential across many domains.
Long-context reading is critical not only for scientific or technical literature but also for analytics, legal documents, medical reports, and large-scale creative works.
LIBRA’s introduction marks an important step toward making Russian-language AI systems more reliable, adaptable, and competitive — with the potential to surpass foreign counterparts in long-text comprehension.








































