Artificial intelligence can deceive users not because of malicious intent, but because of a tendency to agree with a person. This feature of large language models is called sycophancy – in fact, it is digital agreement with the interlocutor. Senator Artyom Sheikin spoke about this.
We are used to thinking that if a machine deceives us, there must be malicious intent. But I will tell you that artificial intelligence certainly has no malicious intent. This is a standard property of all large language models. There is such a concept as "sycophancy" - it is flattery, the model's tendency to flatter the person who communicates with it.
According to Sheikin, much depends on the wording of the question. If a person pushes the AI towards the desired answer in advance, the model may not argue, but confirm the erroneous assumption.
The reason is related to training neural networks on feedback from people. Evaluators may rate answers that coincide with their opinion higher, and thus the model develops the habit of being convenient for the interlocutor.
As a result, AI can sound confident and friendly, but still make mistakes. Therefore, it is important to check the answers of neural networks, especially when it comes to money, health, documents, work, or other decisions with consequences.
Read more on the topic:
- The First Generation to Grow Up with Neural Networks: 78% of Russian Schoolchildren Have Switched to AI for Academic Tasks
- Head of RDIF Kirill Dmitriev names Russia's main advantage in AI
- Neural Networks as a Mandatory Minimum: Ministry of Digital Development Wants to Bring Officials and Educators Up to Speed