Метод ускоренного дообучения нейросетей представлен исследователями из ВШЭ и AIRI

New "Group and Shuffle" Approach Reduces Time and Costs for Adapting AI Models

Researchers from the Higher School of Economics (HSE) and the Artificial Intelligence Institute AIRI have developed an innovative method for fine-tuning neural networks, which speeds up the process of adapting models to new tasks. The technology, called GSOFT, is based on grouping and optimally shuffling data, which reduces computational costs without sacrificing quality.

Comparison of generation results by different methods after 3000 training steps / © Gorbunov, M., Yudin, N., Soboleva, V., Alanov, A., Naumov, A., Rakhuba, M.
Comparison of generation results by different methods after 3000 training steps / © Gorbunov, M., Yudin, N., Soboleva, V., Alanov, A., Naumov, A., Rakhuba, M.

Traditional approaches to fine-tuning neural networks, such as LoRA or BOFT, require significant resources, especially when working with large models. Russian scientists have proposed an alternative — Group-and-Shuffle (GS) matrices, which divide data into groups, process them separately, and then combine them in an optimal way.

We figured out how to form orthogonal matrices using only two matrices of a special type, instead of five or six as in previous approaches. This saves resources and training time.
Nikolay Yudin, Research Intern at the Research and Training Laboratory of Matrix and Tensor Methods in Machine Learning at HSE University

The GSOFT method was tested on various tasks, including fine-tuning the RoBERTa language model and image generation. Compared to its counterparts, it showed higher accuracy with lower memory and time costs. An additional option, Double GSOFT, allows you to adjust parameters from two sides, which increases the flexibility of the model.

We tested the method in various scenarios — from language and generative models to robust convolutional networks. In each of them, it worked reliably and with lower resource costs. This confirms that we can use the method for different purposes.
Aybek Alanov, Senior Researcher at the Center for Deep Learning and Bayesian Methods of the Institute of Artificial Intelligence and Digital Sciences of the Faculty of Computer Science at HSE University, Head of the "Controlled Generative AI" group at the FusionBrain Laboratory of the AIRI Institute

The researchers also tested their method on convolutional neural networks, which are commonly used for image and video analysis, for example, in face recognition systems. They developed GS matrices that can be used even in situations where the model needs to be resistant to noise and distortion.

The versatility of the approach allows it to be applied in various fields — from improving language models to creating robust image recognition systems. This opens up new opportunities for developers who need to quickly adapt AI solutions to changing tasks.

Read more on the topic:

Constructor for Adults: PAK-AI Changes the Approach to Business Digitalization in Russia

Down with Office Routine: Smart Engines Presents AI Agents for Automating the Creation of Templates in Documents

"Alice, subscribe to www1.ru": "Yandex" will complement its voice assistant with an AI agent

Now on home