Sber Is Teaching GigaChat to Speak Tatar – and 20 Other Indigenous Languages
Sber is training its neural network GigaChat to understand and generate text in Tatar, with support from the Academy of Sciences of the Republic of Tatarstan. The initiative is building a large-scale digital corpus drawn from literature and media, positioning AI as a tool for preserving Russia’s linguistic and cultural heritage.

A Scientific Partnership With Cultural Stakes
Alexander Vedyakhin, First Deputy Chairman of the Executive Board of Sberbank, announced the launch of the project during a visit to Kazan. Sberbank and the Academy of Sciences of the Republic of Tatarstan signed an agreement aimed at expanding the linguistic capabilities of GigaChat.
Researchers from the Academy will provide linguistic expertise and archival materials. They will also review training datasets and help the neural network master the specifics of Tatar grammar. The goal is for GigaChat to generate original text in Tatar rather than simply translate content from Russian.
Building a Literary-Grade Training Corpus
To train GigaChat, developers assembled a large-scale corpus of Tatar-language texts. It includes classical literature, contemporary press, textbooks and academic publications. This approach exposes the model to the full range of linguistic expression – from everyday speech to poetry, from humor to formal administrative language.

The corpus is continuously updated. Specialists from Sber and the Academy curate materials that reflect living, contemporary language. Without sufficient high-quality data, a neural network cannot produce coherent, contextually relevant responses. The project addresses what developers describe as a digital data gap affecting minority languages.
Expanding AI to Reflect Russia’s Linguistic Diversity
Tatar is the second most widely spoken language in Russia after Russian. More than 5.3 million people identified it as their native language in the most recent census. Yet many digital services offer limited or no support for it. GigaChat is positioned to become one of the first AI assistants capable of holding full conversations in Tatar.

Residents of Tatarstan will be able to interact with the chatbot in their native language. This is particularly significant for older generations and rural communities, where Tatar remains the primary language of communication. In this context, AI functions as a bridge between traditional culture and modern digital infrastructure.
From Tatar to Twenty Languages
GigaChat is currently being adapted to support twenty languages spoken across Russia. In addition to Tatar, these include Udmurt, Altai, Bashkir, Buryat, Veps, Hill Mari, Ingush and Komi, among others. The Russian online encyclopedia RUVIKI has contributed more than 1.4 million texts in these languages to support model training.
The initiative spans languages from the Caucasus to the Russian Far East. It represents one of the most ambitious efforts to digitize the country’s linguistic heritage. Many of these languages face the risk of decline, and AI could become an infrastructure layer for their long-term preservation.
AI Sovereignty and Global Ambitions
The project signals that Russian developers are building and adapting neural network tools to serve the country’s own linguistic ecosystem.

If successful, Russia could become one of the few countries to deploy a domestically developed neural network supporting dozens of national languages. Countries with multilingual populations, including India and nations across Africa and Latin America, may find such a solution relevant to their own digital inclusion strategies.









































