Yandex Launches Innovative AI Service for Smartphones and PCs

Yandex has introduced a new service that allows running artificial intelligence solutions on smartphones and PCs with minimal computational resource costs. This open-source project aims to reduce expenses when using large language models (LLM).

Image source сгенерировано нейросетью DALL•Е 3

Researcher Vladimir Malinovsky from Yandex's research department has developed a solution for working with a language model containing 8 billion parameters on ordinary devices. This innovation significantly simplifies access to neural networks for companies, startups, and researchers. The project's source code is available on GitHub.

Its foundation is the AQLM neural network compression technology, created by the Yandex Research team in collaboration with the universities of ISTA and KAUST in the summer of 2024. This allows all calculations to be performed directly on users' devices, eliminating the need for expensive graphics processors.

The service allows users to download a model whose size has been reduced from 15 to 2.5 GB. It can function without the internet, and its processing speed depends on the device's power. For example, on a MacBook Pro M1, the model processes 1.5 tokens per second.

Yandex's new service is written in Rust using WebAssembly technology, which allows it to run directly in the browser. Despite significant compression, the model's performance remains at 80% of the original indicators thanks to AQLM and PV-tuning methods.

Yandex Launches Innovative AI Service for Smartphones and PCs

New open-source project reduces the cost of using large language models

Read more on the topic: