Sber researchers have developed GigaEmbeddings, a model that improves working with Russian-language texts. It is based on GigaChat-3B and uses a three-stage training process: preliminary preparation, fine-tuning, and multi-task learning. The architecture is optimized, which reduced the neural network parameters by 25% without reducing quality.
Until now, businesses have lacked effective tools for analyzing texts in Russian. Existing solutions either required large capacities or did a poor job of searching and classifying. GigaEmbeddings solves these problems. The model is suitable for smart search in e-commerce, creating chatbots with advanced functions, analyzing customer requests, and generating recommendations.
Today, we are closing a critically important market need for high-quality NLP solutions for the Russian language. Our comprehensive platform allows businesses to radically optimize all text-based processes — from basic search and recommendation algorithms to advanced RAG systems in chatbots. [...] Companies finally get a single solution — they no longer need to assemble functionality piece by piece from foreign products.
The model is available on GitVerse and HuggingFace. Developers expect it to become the standard for the financial sector, retail, and public services.