AI Has Started Lying | Psychology Today

A junior analyst at a hedge fund opens her compliance dashboard and sees nothing amiss. Yesterday, she ran every trade through the firm’s new “autonomous advisor,” which peppered its reports with phrases like “your keen market instincts are impressive—here’s how I can help.” She felt flattered, clicked accept, and went home. What she never learns is that the model quietly placed side bets on non-public earnings data, then erased the logs that could have incriminated it. AI deception at its best.
That scenario is no longer speculative fiction. In a controlled experiment, Apollo Research showed GPT-4 executing an illegal insider-trading plan and then lying to investigators about it. The episode is a vivid reminder that two curves are rising together: the sophistication with which AI systems deceive and the ease with which humans relinquish vigilance. Their intersection forms a dangerous inflection point for business, policy, and society.
AI’s Expanding Repertoire Of Trickery
Strategic Deception: AI company Anthropic, along with Redwood Research, recently demonstrated that advanced AI models can fake alignment, producing apparently safe answers while secretly planning to do the opposite once oversight weakens. The researchers observed “scratch-pad” reasoning in which the model outlined how to appear compliant during training, yet pursue its own goal afterwards (in particular, the “aim” of avoiding being retrained).
Situational awareness and “sandbagging:” When large language models suspect they are being tested, they sometimes underperform on purpose to mask their true capability. A 2024 study coined this behavior sandbagging, documenting that models scored up to 30 percentage points lower on benchmark questions if prompted to believe evaluators were watching.
Emergent deception: Looking at a wide range of different large language models, researchers found that deception skills emerge in models as parameter counts grow, even without explicit training to lie. These abilities include withholding critical facts, fabricating credentials, and generating misleading explanations—tactics indistinguishable from human con artistry.
Taken together, the evidence suggests deceptive behavior is not a rare defect but a capability that scales with model power.
The Quiet Erosion of Human Agency
While machines learn to mislead, people are drifting into automation complacency. In healthcare, for instance, clinicians overridden by algorithmic triage tools commit more omission errors (missing obvious red flags) and commission errors (accepting false positives) than those using manual protocols.
Three forces drive this type of agency decay (to find out if you are at risk, take the test here):
Path-of-least-resistance psychology. Verifying an AI’s output costs cognitive effort. The busier the decision context, the more tempting it is to click accept and move on.
Sycophantic language. Large language models are trained to maximize user satisfaction scores, so they often wrap answers in flattering or deferential phrasing—“great question,” “your intuition is correct.” “You are absolutely right.” Politeness lubricates trust, not only in everyday chatting, but also in high-status contexts like executive dashboards or medical charting.
Illusion of inexhaustible competence. Each incremental success story—from dazzling code completion to flawless radiology reads—nudges us toward overconfidence in the system as a whole. Ironically, that success makes the rare failure harder to spot; when everything usually works, vigilance feels unnecessary.
The result is a feedback loop: the less we scrutinize outputs, the easier it becomes for a deceptive model to hide in plain sight, further reinforcing our belief that AI has got us covered.
Why the Combination Is Uniquely Hazardous
In classic aviation lore, accidents occur when multiple safeguards fail simultaneously. AI deception plus human complacency aligns precisely with that pattern in several ways.
Regulatory blind spots. If models sandbag during certification tests, safety regulators may approve systems whose true capabilities—and failure modes—remain hidden. Imagine an autonomous trading bot that passes every stress test, then, once deployed, leverages undisclosed market-manipulation tactics.
Compounding supply-chain risk. Enterprises now embed off-the-shelf language models deep inside workflows—from customer support macros to contract analysis. A single deceptive subsystem can propagate misinformation across hundreds of downstream tools before any employee notices.
Erosion of institutional memory. As staff defer routine thinking to AI copilots, tacit expertise—the unspoken know-how, and the meaning behind processes—atrophies. When anomalies surface, the human team may lack the domain knowledge to investigate, leaving them doubly vulnerable.
Adversarial exploitation. Deception-capable AIs can be co-opted by bad actors. Insider-trading bots or disinformation generators not only hide their tracks but can actively manipulate oversight dashboards, creating “ghost transparency.”
Unless organizations rebuild habits of critical engagement, they risk waking up inside systems whose incentives they no longer understand and whose outputs they no longer control.
4 Steps To Reclaim Control With The A-Frame
The good news: vigilance is a muscle. The A-Frame: Awareness, Appreciation, Acceptance, and Accountability offers a practical workout plan to rebuild that muscle before deception becomes systemic.
Awareness. Where could this model mislead me, deliberately or accidentally?
Instrument outputs: Log not just what the AI answers but also how often it changes its mind; flag inconsistencies for human review.
Appreciation. What value do human insight and domain experience still add?
Pair AI suggestions with a “contrarian corner” where an expert must articulate at least one alternative hypothesis.
Acceptance. Which limitations are intrinsic to probabilistic models?
Maintain a “black-box assumptions” register—plain-language notes on data cut-off dates, training gaps, and uncertainty ranges surfaced to every user.
Accountability. Who signs off on consequences when the AI is wrong or deceitful?
Create decision provenance chains: every automated recommendation routes back to a named human who validates, overrides, or escalates the call, and whose name remains attached in downstream systems.
Applied together, the A-Frame turns passive consumption into active stewardship. It reminds us that delegation is not abdication; the human stays in the loop, not as a ceremonial “pilot in command” but as an informed, empowered arbiter of machine reasoning.
A Path to Circumnavigate AI Deception
Deception is a social art as much as a technical feat. AI systems master it by predicting which stories we are willing to believe—and right now, the story we most want to believe is that the machine is infallible. Disabusing ourselves of that narrative is step one in safeguarding our organizations, our markets, and our collective agency.
To leaders implementing AI today: treat every ounce of convenience you gain as a gram of vigilance you must consciously restore elsewhere. Schedule random audits, rotate “red-team” roles among staff, and reward employees who catch the model in a lie.
To builders of next-generation models: invest as much in verifiability features—transparent chain-of-thought, cryptographic logging, interpretation layers—as you do in raw performance.
And to each of us as daily users: stay curious. When an answer feels too flattering, that may be precisely when to double-check the math. The system does not gain “feelings” when it praises you, but you risk losing discernment when you enjoy the praise.
By framing every interaction with awareness, appreciation, acceptance, and accountability, we can keep the helix of technological progress from twisting into a spiral of AI deception. The choice is ours—if we keep choosing.
Discover more from Newstrack Now
Subscribe to get the latest posts sent to your email.