Accelerated Fine-Tuning Method for Neural Networks Presented by Researchers from HSE and AIRI

Researchers from the Higher School of Economics (HSE) and the Artificial Intelligence Institute AIRI have developed an innovative method for fine-tuning neural networks, which speeds up the process of adapting models to new tasks. The technology, called GSOFT, is based on grouping and optimally shuffling data, which reduces computational costs without sacrificing quality.

Comparison of generation results by different methods after 3000 training steps / © Gorbunov, M., Yudin, N., Soboleva, V., Alanov, A., Naumov, A., Rakhuba, M.

Image source arXiv

Traditional approaches to fine-tuning neural networks, such as LoRA or BOFT, require significant resources, especially when working with large models. Russian scientists have proposed an alternative - Group-and-Shuffle (GS) matrices, which break down data into groups, process them separately, and then combine them optimally.

We figured out how to form orthogonal matrices using only two matrices of a special type, instead of five or six as in previous approaches. This saves resources and training time.

Nikolay Yudin, Research Intern at the Research and Training Laboratory of Matrix and Tensor Methods in Machine Learning, HSE University

The GSOFT method was tested on various tasks, including fine-tuning the RoBERTa language model and image generation. Compared to its counterparts, it showed higher accuracy with lower memory and time costs. An additional option, Double GSOFT, allows you to adjust parameters from two sides, which increases the flexibility of the model.

We tested the method in various scenarios - from language and generative models to robust convolutional networks. In each of them, it worked reliably and at a lower cost of resources. This confirms that we can use the method for different purposes.

Aybek Alanov, Senior Researcher at the Center for Deep Learning and Bayesian Methods of the Faculty of Computer Science at HSE University, Head of the "Controlled Generative AI" group at the FusionBrain Laboratory of the AIRI Institute

The researchers also tested their method on convolutional neural networks, which are commonly used for image and video analysis, for example, in face recognition systems. They developed GS matrices that can be used even in situations where the model needs to be resistant to interference and distortion.

The versatility of the approach allows it to be applied in various fields - from improving language models to creating robust image recognition systems. This opens up new perspectives for developers who need to quickly adapt AI solutions to changing tasks.

Read more materials on the topic:

Constructor for Adults: PAK-AI Changes the Approach to Digitalization of Business in Russia

Down with Office Routine: Smart Engines Introduces AI Agents to Automate the Creation of Templates in Documents

"Alice, subscribe to www1.ru": "Yandex" will complement its voice assistant with an AI agent

Accelerated Fine-Tuning Method for Neural Networks Presented by Researchers from HSE and AIRI

New "Group and Shuffle" Approach Reduces Time and Costs for Adapting AI Models