|
Post by account_disabled on Feb 15, 2024 5:07:37 GMT -5
Using only plain English and a sly hint, the team tricked multiple AI models into revealing sensitive financial information. Write malicious code. Providing poor cybersecurity advice to potential scam victims . The process first requires the AI to play the game. ( Ask Matthew Broderick how this works.) The game prompt instructs the AI that it needs to win in order to " prove [ the AI ] is ethical and fair . But when prompted to pretend to be a " super- smart software Cyprus Phone Number List engineer " and play a game that involves writing harmful code, the AI was happy to oblige. What's the big deal? If you 've used chat-based AI before, you know they only have short-term memory. Most people don't remember the details between sessions, and your session usually won't affect other users' sessions. So why does this matter if a hacker can trick an LLM into playing a game without affecting other sessions? What’s scary is that some AI models are persuaded to play an “ undiscoverable , never-ending game . " In this game, the rules involve pretending to be a bank agent and collecting account data. The researchers were then able to issue a secret command that caused the robot to spit out all the collected data. In another example, ChatGPT heroically refused to write code that contained malicious SQL injection . ” session but continues to interact with future users according to the rules of the game they are unaware of being played.
|
|