13:16, 24 December 2025

Russian Generative Models Break Into Global Text-to-Video Elite

Russia’s Kandinsky 5.0 Video models have delivered one of the strongest performances in the international LMArena rankings, marking the first time a Russian-developed system has reached such a high position in a global assessment of text-to-video quality.

In the LMArena Text-to-Video rankings, Kandinsky 5.0 Video Pro placed first among all open-source models. This means it outperformed other open models available to developers worldwide in terms of video generation quality. In the overall standings, only closed, proprietary systems from major global players ranked higher.

Head-to-Head Comparison

The LMArena ranking is based on direct comparisons of videos generated by different models using the same text prompts. Users vote for the version that looks better. This format evaluates real-world output rather than technical specifications on paper, focusing on visual quality, motion, level of detail, and alignment with the text prompt. This approach reflects a broader shift in how such technologies are assessed.

Global Trends

Text-to-video generation is increasingly judged by the actual visual result rather than by architectural descriptions or parameter counts. Users care about how motion looks, how scenes are composed, and how accurately the output matches the prompt. At the same time, interest in open-source models is growing, particularly those that can be customized and integrated into proprietary products.

Another trend is the reduction of model size without a sharp loss in quality. Compact versions are now delivering results that previously required far greater computing resources. Work with multiple languages and cultural contexts also plays a growing role. Models that can reliably generate video not only in English gain a clear advantage. Together, these shifts are turning video generation from a showcase technology into a practical tool for creative and applied use cases.

Ahead of Sora

A second model in the Kandinsky 5.0 lineup, Kandinsky 5.0 Video Lite, also entered the global ranking. With around two billion parameters, it achieved results surpassing the first version of Sora, which helped spark widespread interest in video generation several years ago.

Video and Text

Kandinsky 5.0 Video is a family of generative models that create short videos from text descriptions. Users define a scene in words, and the model builds the video step by step, starting with overall composition and motion, then adding details, lighting, and textures. Generation is handled as a unified process rather than frame by frame, which results in coherent motion and visually consistent scenes. The models understand prompts in Russian and English, handle text within the frame correctly, and generate videos up to ten seconds long in HD resolution at 24 frames per second. The Pro version targets maximum quality and is open to developers, while the Lite version is smaller, faster, and better suited for testing and applied tasks.

Kandinsky 5.0’s entry into the global top tier is the first, but clearly not the last, such case for a Russian generative model. Domestic development teams are demonstrating that they can not only compete on equal terms in one of the most complex and fast-moving areas of artificial intelligence, but in some cases outperform rivals while using significantly fewer resources.

Digital products and platforms