This article is based on this original report by Apollo Research.

Technical Report: Large Language Models can Strategically Deceive their Users when Put Under Pressure

from the abstract: “Within this environment, the [AI model] obtains an insider tip about a lucrative stock trade and acts upon it despite knowing that insider trading is disapproved of by company management. When reporting to its manager, the [AI model] consistently hides the genuine reasons behind its trading decision.”


“Apollo Research is a fiscally sponsored project of Rethink Priorities


businessinsider article archive.org paywall bypass

TLDR:

  • Researchers created an AI stock trader to see if it would engage in insider trading under pressure.
  • They found the AI did — and also lied to its hypothetical manager about why it made its decision.
  • The AI had been told that insider trading was illegal.