19:29, 06 August 2025

AmbiK: How a Russian Dataset Is Shaping the Future of Smart Assistants

Developed by AIRI and MIPT with support from Sber’s Robotics Center, AmbiK is the world’s largest open dataset for testing AI systems’ ability to understand everyday human commands. Even the most advanced models fail 80% of the time.

When AI Doesn’t Know What ‘Add a Pinch of Salt’ Means

Russian researchers have released AmbiK—the world’s most extensive open-source dataset specifically designed to evaluate how robotic systems interpret ambiguous and incomplete instructions. It includes 2,000 real-world tasks, each annotated by the type of ambiguity it represents—from user preferences to safety-related risks.

What makes AmbiK stand out is its focus on uncertainty. Unlike traditional datasets that contain clearly defined tasks, AmbiK simulates the fuzzy and often vague nature of real household instructions: “heat it up until it tastes good,” “make it stronger,” or “don’t oversalt.” Each scenario is labeled across three core categories: general knowledge (What does ‘bring to a boil’ actually mean?), user preference (Sweet or not?), and safety (Is it okay to leave the stove unattended?).

The results are striking. Even the most sophisticated language models only succeeded in about 20% of the test cases. This isn’t just a flaw—it’s a wake-up call: current AI systems are still unprepared for the uncertainty and nuance that come with natural human interactions.

Photo - AmbiK: How a Russian Dataset Is Shaping the Future of Smart Assistants

The World’s Largest Dataset of Its Kind

Most similar datasets include only 500 to 600 examples. AmbiK is nearly four times larger, making it the most comprehensive resource of its kind. Its open-source status is a breakthrough for the global research community—anyone can use it for training and benchmarking AI models.

This is especially significant for Russia. Until now, there were virtually no large-scale text-based datasets tailored to everyday interaction. AmbiK changes that, becoming the country’s first major entry into the field of linguistic understanding for robots.

This dataset is not just about interpreting vague instructions—it’s also invaluable for behavior planning systems. Perfect mechanics are impressive, but without autonomy and learning capabilities, they lose their real-world value

Alexey Kovalev

Head of the 'Embodied Agents' group at AIRI’s Cognitive Systems Lab

From Kitchen Tasks to Digital Sovereignty

For everyday users in Russia, AmbiK is a leap toward more intuitive and safer AI assistants. Imagine a smart fridge or cooking robot that truly understands what you mean when you say, “Make it like last time.”

For Russia as a nation, this project is a step toward technological sovereignty—building localized datasets in the critical human-robot interaction domain, reducing dependency on Western platforms, and strengthening Russia’s position in the global race for AI leadership.

On a global scale, AmbiK could set a new benchmark. Its openness enables use as a universal platform for evaluating language models and planning systems. Think of it as a gym for AI—not just to learn syntax, but to develop common sense.

From Scientific Tool to Global Export

AmbiK is more than just a dataset—it’s a foundation for the future. It opens the door for exporting Russian AI technologies. This benchmark set could become an international standard for testing how robots interpret human commands.

It also helps adapt AI to real-world environments—our kitchens, habits, and cultural nuances. No need to imitate Western templates anymore. Now, Russia has its own data, reflecting its own way of life.

For scientists, AmbiK is a unique tool. It allows researchers to study language models without relying on sensors or visual input. Its step-by-step task breakdown reveals how planning evolves within AI and where reasoning logic still falls short.

The Beginning of a Smarter Future

AmbiK isn’t just another dataset. It’s a strategic asset for developing intelligent home robots and digital assistants. Its release reinforces a core principle of AI: language understanding is about more than grammar—it’s about context, common sense, and safety.

With current models succeeding at just 20% of tasks, AmbiK marks a starting point, not a limitation. In the coming years, we can expect deep integration of this dataset into Russian AI education and research programs, as well as international use. The collection will expand to include new scenarios, rooms (bathroom, living room), and even social contexts.

AmbiK is more than a scientific milestone. It’s a global challenge: train your AI to understand not just words—but meaning.

Science and new technologies