Яндекс запускает инновационный сервис для работы с ИИ на смартфонах и ПК

New open-source project reduces the cost of using large language models

Yandex has introduced a new service that allows running artificial intelligence solutions on smartphones and PCs with minimal computational resource costs. This open-source project aims to reduce expenses when using large language models (LLM).

Researcher Vladimir Malinovsky from Yandex's scientific department has developed a solution for working with a language model containing 8 billion parameters on ordinary devices. This innovation significantly simplifies access to neural networks for companies, startups, and researchers. The project's source code is available on GitHub.

Its foundation is the AQLM neural network compression technology, created by the Yandex Research team in collaboration with the ISTA and KAUST universities in the summer of 2024. This allows all calculations to be performed directly on users' devices, eliminating the need for expensive graphics processors.

The service allows users to download a model whose size has been reduced from 15 to 2.5 GB. It can function without the internet, and its processing speed depends on the device's power. For example, on a MacBook Pro M1, the model processes 1.5 tokens per second.

Yandex's new service is written in Rust using WebAssembly technology, which allows it to run directly in the browser. Despite significant compression, the model's performance remains at 80% of the original indicators thanks to AQLM and PV-tuning methods.

Read more on the topic:

Yandex's neural network has learned to summarize text concisely

YandexGPT passed the Unified State Exam in Literature

Following the thread of the conversation: Yandex's neural network has become more self-learning