AI Learns to Analyze Voice, Facial Expressions, and Speech Simultaneously, Developed by HSE University and Sberbank Specialists

A new artificial intelligence system has been developed in Russia that recognizes human emotions better than its counterparts. The uniqueness of this technology lies in its simultaneous analysis of three sources of information: facial expressions, voice, and speech. This comprehensive assessment allows the system to be 10% more accurate than the best existing algorithms, which rely on only one source of data.

Image source сгенерировано нейросетью qwenlm.ai

Andrey Savchenko, Scientific Director of the Sberbank Center for Practical Artificial Intelligence, said that the new technology is already demonstrating impressive results in tests. In the future, it can be adapted for use in virtual assistants, security systems, and telemedicine. One of the key advantages of the system is its flexibility: it can work even in conditions of data scarcity, for example, when the user's face is not visible or the voice is difficult to hear.

The development was carried out by Andrey Savchenko and his colleague Alexey Andreev from HSE University (Nizhny Novgorod). The structure of the system allows it to take into account changes in emotional state over time, which makes it more effective. Unlike other emotion recognition technologies, the new system can process multiple channels of information simultaneously, including facial expressions, voice characteristics, and speech structure.

According to the scientists, their development can be useful not only in marketing, but also in the field of security, where AI can help identify aggression or panic.

AI Learns to Analyze Voice, Facial Expressions, and Speech Simultaneously, Developed by HSE University and Sberbank Specialists

Unique algorithm proves 10% more accurate than existing counterparts

Read more on the topic: