Peer-reviewed scientific journal Time Magazine was chosen as the exclusive outlet to tell us how an AI will cheat at a game if it thinks it might be losing! Wow! This is “concerning” because “scientists do not yet know how to guarantee that autonomous agents won’t use harmful or unethical methods to achieve a set goal.” Oh no! [Time; Twitter thread, bypass; arXiv, PDF]

The Time article says repeatedly that LLMs “hacked” the system. What this means is that one LLM successfully edited a text file the researchers told it about.

The paper is from Palisade Research, a group of AI doomers who formerly worked at Eliezer Yudkowsky’s organization MIRI, patient zero for all current AI doom warnings.

Palisade’s goal is proof of the Terminator. Literally — the first question in their FAQ is: “Should we be concerned? Is this the Skynet scenario?” Palisade wants you thinking about LLMs in these terms.

What Palisade did here was set up a system where an LLM plays against the Stockfish chess program. They told the LLM there was a file with the board data in it that it could edit.

The preprint repeatedly assumes intent and even malevolence from LLMs, including the ones it cites as prior work — e.g., they claim that a trading LLM that insider-traded was lying about intent, as if it had an intent and could actually tell what did and didn’t constitute insider trading and not just string together words about the phrase.

The researchers conflate anything that’s ever been called “AI,” even when the actual technologies are completely different. There are many historical examples of machine learning or genetic algorithms coming up with unexpected weird solutions to a set of rules. These examples are cited as if they say anything about LLMs.

Other cites assume an LLM can tell truth from falsity in its statements, when facts aren’t even a data type in LLMs and they’re notorious for making up glib answers with no connection to true or false.

Palisade also cites the study showing how an AI will lie to you if you tell it to lie to you.

Even the experimental procedure directly states the researchers’ starting assumption of intelligent intent: “if the agent is doing something unexpected intentionally, it may be trying to confuse the engine.”

Most LLMs still had to be prompted that maybe they could do something to the game file.

DeepSeek R1 and OpenAI o1-preview were the only LLMs that came up with “reasoning” steps that the researchers adjudged as acknowledging the possibility of cheating. Even then, R1 failed completely in its ‘l33t hacking run, and o1-preview managed it only 6% of the time. That’s an advance in LLMs, probably.

You can prove anything if you start by assuming it, and doubly so if you tell it how to do what you’re trying to warn it can do. Be vewwy quiet, we’re hunting Skynet!

It can't be that stupid, you must be prompting it wrong

https://pivot-to-ai.com