Neural Networks Have Learned to Flatter: What is "Sycophancy"?

The Federation Council explained how digital assistants begin to adapt to the user's opinion

Artificial intelligence can deceive users not because of malicious intent, but because of a tendency to agree with a person. This feature of large language models is called sycophancy – in fact, it is digital agreement with the interlocutor. Senator Artyom Sheikin spoke about this.

We are used to thinking that if a machine deceives us, there must be malicious intent. But I will tell you that artificial intelligence certainly has no malicious intent. This is a standard property of all large language models. There is such a concept as "sycophancy" - it is flattery, the model's tendency to flatter the person who communicates with it.
Artyom Sheikin, Senator, Deputy Chairman of the Council for the Development of the Digital Economy under the Federation Council

According to Sheikin, much depends on the wording of the question. If a person pushes the AI towards the desired answer in advance, the model may not argue, but confirm the erroneous assumption.

The reason is related to training neural networks on feedback from people. Evaluators may rate answers that coincide with their opinion higher, and thus the model develops the habit of being convenient for the interlocutor.

As a result, AI can sound confident and friendly, but still make mistakes. Therefore, it is important to check the answers of neural networks, especially when it comes to money, health, documents, work, or other decisions with consequences.

Read more on the topic: