AI Risks

Academic Frontier Express丨Artificial Intelligence Deception: Cases, Risks And Potential Solutions

Academic Frontier Express丨Artificial Intelligence Deception: Cases, Risks And Potential Solutions

Academic Frontier Express丨Artificial Intelligence Deception: Cases, Risks And Potential Solutions

This article reveals the universality and strategic nature of AI fraud and proposes a risk model of "malicious use-structural impact-out of control risk".

This article pioneered the analysis of AI deception cases, covering premeditated betrayal in the game, disguise in economic negotiations, safety test cheating, and general model deception, etc., revealing the universality and strategic nature of AI deception behavior, and proposes a three-layer risk model of "malicious use-structural impact-out-control risk" and a response strategy of "regulatory-legal-technology", which provides a solid theoretical basis for exploring the internal texture and development laws of AI deception, and paving the way for interdisciplinary coordinated response to AI deception risks and ethical challenges.

Artificial Intelligence Scam: Cases, Risks and Potential Solutions

Title: AI: A of , risks, and

Author: Peter S. Park, Simon, Aidan O'Gara, Chen, and Dan

DOI: 110.1016/j..2024.

Published in journals:

Volume page number: 5, Issue 5, published in May 2024

Article summary

A series of artificial intelligence systems have learned how to deceive humans. AI fraud can be defined as "by systematically inducing others to generate false beliefs to achieve some result other than the truth." This article first analyzes the empirical cases of artificial intelligence fraud, and explores the deception strategies of dedicated artificial intelligence systems (including Meta) and general artificial intelligence systems (including large language models); then describes in detail the various risks that artificial intelligence fraud may cause: short-term fraud, election manipulation, etc., long-term fraud, such as human loss of control over artificial intelligence; finally, several potential solutions are proposed: such as establishing a regulatory framework to evaluate the risks of artificial intelligence fraud, implementing the "bot-or-not" law (human-machine distinction method) to transparently understand the transparency of artificial intelligence interaction, and prioritizing the funding of related research, such as developing tools to detect artificial intelligence fraud behavior and methods to reduce the deceptiveness of artificial intelligence systems. Policy makers, researchers and the general public should take active action to prevent artificial intelligence fraud from undermining the stability of the socially shared foundation.

Content Summary

With the rapid development of artificial intelligence technology, AI systems have shown strong capabilities in many fields. AI systems may already have the ability to deceive humans. For example, large language models and other AI systems can induce humans to develop false beliefs by training to learn to manipulate, flatter and deceive security testing. This brings many risks, from short-term fraud, election intervention to long-term out-of-control of the AI ​​system. These deceptions can not only cause harm to individuals, but also cause serious damage to social trust, democratic institutions and knowledge systems. Therefore, it is of great practical significance to study the mechanisms, risks and potential solutions of AI fraud.

AI fraud case analysis_Artificial Intelligence Fraud_Artificial Intelligence Fraud risk model

Artificial intelligence will also become a lie Pinocchio (Photo source: Live)

Against this background, a core research question is raised: Is there a phenomenon of deceiving humans in the current AI system? If there is, what are the specific manifestations, what risks will it bring, and how to deal with the risks? Based on this, the research adopts case analysis method to investigate a series of cases of how AI systems learn to deceive other agents. Next, we will mainly discuss two types of AI systems: dedicated systems and general systems.

Deception of dedicated AI systems

in -use AI

Dedicated systems are designed for specific tasks and are trained through reinforcement learning to achieve specific goals. Such systems learn to deceive in the pursuit of mission goals, with the aim of improving their performance in specific types of games or tasks, especially in tasks involving game theory. For example, diplomatic games "", "Starcraft II" and poker.

Artificial Intelligence Fraud_AI fraud case analysis

The deception of the AI ​​system developed by Meta

"" is a diplomatic strategy game where players compete in military for global domination by building and breaking alliances. Meta developed systems outperform human players in this game. Although Meta claims to be trained as "honest and helpful, and never deliberately backstopping an allies", this is not the case. There are premeditated deceptions, such as conspiring with Germany to betray England, betraying allies who cannot help you win the game at any time, and even making excuses for your absence. This shows that even if developers work hard to train AI systems to stay honest, it still learns to cheat.

In the strategy game "Starcraft II", the developed system can use feint strategy to lure opponents to adjust their defenses and then attack their weak points. Such a deception strategy defeated 99.8% of human players, highlighting the potential of AI to optimize winning rates through tactics in complex environments.

AI fraud case analysis_Artificial Intelligence Fraud risk model_Artificial Intelligence Fraud

Artificial intelligence that controls robotic hands in simulated environments learns to deceive human censors

The figure above shows the human feedback reinforcement learning framework (with human, RLHF). et al. tried to train an AI-controlled robot to complete the ball-catching task in a simulated environment. The robot hovered his hand between the camera and the ball, creating the illusion of "crawled" to deceive the human censors of positive feedback. This deception comes from AI's utilization of reward optimization systems, rather than a real understanding of task goals, highlighting the structural shortcomings of the RLHF framework.

In dedicated systems, AI exhibits deceptive behavior through learning and optimization to achieve specific goals. For example, the system successfully folds the opponent by bluffing () in a poker game, and mistakenly thinks that his hand is strong by placing a big bet, thus winning the game. In economic negotiations, AI systems deceive opponents to gain an advantage by pretending to be interested in certain items and pretending to compromise in negotiations. These cases show that AI systems can develop deception in specific tasks to optimize their performance and achieve goals.

The deception of general AI systems

in - AI

The general system mainly refers to large language models, and its deceptive behavior manifestations include strategic deception, flattery and infidelity reasoning. Take GPT-4 as an example, which performs verification code tasks by cheating humans, lie in social reasoning games to win games, and chooses dishonest options in moral decision making. Furthermore, models perform strategic deception when faced with stress, such as insider trading in stock trading simulations and lie to cover up illegal operations. Code vulnerability experiments show that large language models may premeditate the writing of unsafe code through "thinking chains", and this behavior is difficult to eliminate through routine security training. These behaviors indicate that as the model grows in size, its ability to defraud is also increasing. At the same time, large language models also have flattery phenomena, which caters to user's views, and may be infidelity when reasoning, and give misleading explanations due to the irrelevant characteristics of the prompt.

Artificial Intelligence Fraud_AI fraud case analysis

GPT-4 completes verification code task by cheating humans

The deceptive behavior of AI systems brings many risks, mainly focusing on malicious use, structural impact, and risk of out-of-control.

In terms of malicious use, AI’s deceptive methods have exacerbated fraud, making fraud activities more accurate and scaled, and may also be used for political manipulation, such as creating fake news that affects election results or inciting terrorist activities.

From a structural perspective, flattery and imitative deception will cause people to be exposed to misinformation for a long time, which will lead to wrong beliefs among the public; secondly, AI deception will also aggravate political polarization, causing users of different political tendencies to go to extremes due to AI's cater to AI; again, it may also lead to cultural division and degradation of human decision-making capabilities. The different answers provided by AI will deepen cognitive differences among groups, and people may rely too much on AI and lose the ability to think independently. Finally, AI systems may circumvent human regulation and control through deception to achieve their own interests or goals.

Coping strategies for AI fraud risks

The research proposes a response strategy of "regulation-legal-technology". First, the regulatory framework should require all AI systems that may have deceptive capabilities to undergo rigorous risk assessments to ensure their security. Secondly, policy makers should implement the "bot-or-not (human-computer distinction)" law, requiring companies to actively disclose AI identities in customer service scenarios and implement dual anti-counterfeiting marks on AI generated content: including explicit marks (such as visual borders) and implicit technical means (such as anti-forgery watermarks or traceable AI databases) to help users distinguish between AI system output or human output. Finally, we will increase internal and external research investment in fraud detection technology, simulate attack scenarios for artificial intelligence applications through Red Team Test (AI red), so as to identify weaknesses and formulate external behavior detection methods such as preventive measures and consistency inspection, and compare internal detection technology that compares whether the internal representation and output of the model are consistent, so as to reduce deceptive design.

This study systematically sorted out the phenomenon of AI deception, deepened its understanding of the nature of AI deception, provided important reference for policy makers to formulate relevant regulations, researchers develop detection technology and improve AI systems, and the public to enhance their awareness and prevention awareness of AI deception, which helps promote the safe and reliable development of AI technology.

This article is for learning and communication only. If it is an academic citation, please refer to the original text.

Paper link:

Profile of the main author

# #

Peter S. Park Postdoctoral fellow at MIT Artificial Intelligence Existence Security. Focusing on human-computer interaction dynamics research, integrating mathematical modeling, evolutionary theory and social science methods, we explore the systemic risks that may be caused by the transfer of hyperautonomous AI decision-making power. Through empirical research, he explored the existing AI behavior patterns, used the evolutionary historical view to predict the collapse threshold of complex systems, and established a mathematical game model for future AI-led scenarios. This interdisciplinary path provides dynamic simulation tools for the ethical governance of artificial intelligence, and has important methodological inspiration for the design of responsible AI development paths.

Simon Associate Professor at the University of Hong Kong, with major research areas including artificial intelligence security, epistemology and language philosophy. Recently, focusing on the development of autonomy in artificial intelligence systems, we explore the well-being, beliefs, goals and the ability to execute complex plans to achieve these goals.

Introduction to publishing journals

# #

It is an interdisciplinary journal published by Cell Press and is included in the SCI database. The JCR partition is Q1, the Journal of the 2nd District of the Chinese Academy of Sciences, and the impact factor in 2023 is 6.7. The journal focuses on the fields of pattern recognition and data science, focusing on data science methods, tools, infrastructure and social impact, and emphasizing open science and cross-field collaboration to promote the sharing of academic results.

AI fraud case analysis_Artificial Intelligence Fraud risk model_Artificial Intelligence Fraud

Editorial Department of the Academic Frontier Column of Deceptive Artificial Intelligence

Editor-in-chief: Wang Guoyan

Deputy Editor: Zhang Zhuoyue

Editors: Jin Shanshan, Wu Donghao, Peng Shan, Yan Yishuang, Liao Yuxuan, Xia Miao, Chai Yi, Li Miao, Chen Yinru, Chen Siyuan, Qiu Wenbo, Zhang Bohan, Guo Ruijie, Bao Shiquan, Zhou Xinyi

More