Sber has opened access to the experimental language model GFusion, created by former fundamental models team intern Daniil Tikhonov. Unlike conventional neural networks, it does not write answers strictly from left to right, but first assembles a draft, then gradually improves individual parts.
This approach is called diffusion — neural networks create images and videos in a similar way. A regular language model has to rewrite the continuation of an answer if it made a mistake at the beginning. GFusion can return to the desired fragment and correct it without starting over.
Due to parallel generation, the model, according to Sber's tests, works up to 45% faster than GigaChat 3, on which it was trained. Developers believe that this principle can be useful where speed is important: in code autocompletion, AI agents, and services with minimal latency.
Diffusion models structure answers better and can generate text non-sequentially, independently choosing the order of writing.
Along with GFusion, the company has released tools for training similar models. They should reduce the need for graphics cards and accelerate developers' experiments. The team also added support for its architecture to SGLang — a popular open-source tool for running language models.