The Institute of Oriental Studies of the Russian Academy of Sciences, in collaboration with Yandex, has created an artificial intelligence capable of processing thousands of primary sources in Oriental languages daily. The new AI assistant analyzes scientific texts, extracts key facts, and generates concise analytical summaries, including materials from Chinese media.
The system already works with more than 1.5 million documents and covers four variants of the Chinese language — mainland, Taiwanese, Hong Kong, and Singaporean.
According to the Director of the Institute, Alikber Alikberov, the project is part of a large-scale digital transformation of the institution. "The partnership with Yandex allows us to significantly expand the volume and depth of material analysis, while maintaining the fundamental nature of the academic approach," he noted.
Senior Researcher Alexander Kostyrkin noted that the time for processing information with the AI assistant is reduced from several hours to 10–15 minutes per research task, which makes it possible to analyze hundreds of sources daily.
The AI assistant is built on the Yandex AI Studio platform using generative models and Yandex Cloud technologies. The system operates on the principle of RAG architecture: the language model generates answers based on data from its own database and additional sources. Vectorization and semantic search allow finding information by meaning, and the FRED-T5-Summarizer model compresses texts into short summaries, forming understandable Russian-language answers.
Special attention is paid to monitoring Chinese media. Previously, the system translated texts through English, but with the advent of large Qwen models, direct work with Chinese sources is possible. Currently, the 235-billion model Qwen3 is being connected, which allows researchers to obtain more accurate data without the mediation of English-language sources.
Read more materials:
- AI taught to understand the Buddhist canon of 333 volumes: the Institute of Oriental Studies created a digital assistant-translator from Tibetan — TRAIT
- More than 30 languages: Yandex introduced a tool for automating code writing based on neural networks
- Salam Aleykum: GigaChat will speak in Arabic and Uzbek