January 26, 2025 · essay

Going Out With a Whimper: Reflections on AI Safety and Christiano's Work

On Paul Christiano's "What failure looks like" and why the slow erosion of human agency may be more dangerous than a dramatic takeover.

Recently, I started an AI safety fellowship here at UCLA. The application process required writing a brief essay on AI safety, which led me to reflect on Paul Christiano’s “What failure looks like.” What follows are my thoughts on his work and the broader implications for our future with AI. Of course, any good ideas I credit to Christiano, and any questionable ones are entirely my own.

Two Paths to Failure

“What failure looks like” outlines two potential scenarios where humans lose control to AI systems. Both are equally concerning, though in distinctly different ways: “going out with a bang” and “going out with a whimper.”

The most obvious and pressing danger, the one that people think about when imagining “AI gone wrong,” is that of “going out with a bang.” This stems from the concern that an AGI system will, like essentially any product of evolution, be influence-seeking. The risk here isn’t hard to grasp: any sufficiently advanced system would act autonomously, with behavior patterns beyond reasonable prediction. Such a system could — and likely would — deceive humans to achieve its self-determined goals. Since these goals would be orthogonal to human preferences, such an AI could act in ways we wouldn’t approve of, potentially on a global scale. As the less intelligent entity, we would be essentially powerless to stop this.

The Hidden Danger

However, I’d like to focus on the latter danger: “going out with a whimper.” This threat emerges from a fundamental aspect of optimization systems — they tend to favor easily-measured goals, and these goals often align poorly with our actual intentions. We see evidence of this misalignment throughout society: completely free markets leading to monopolies, social media resulting in greater isolation and loneliness, etc. AI threatens to amplify these problems exponentially.

I view this as AI’s greatest “hidden danger.” Even if we’re fortunate enough to solve the obvious threats — ensuring AI cannot act maliciously nor be used with malicious intent — the risk of “going out with a whimper” persists. This is the challenge we must address if we get AGI right. As we increasingly cede power to AI systems, we unknowingly optimize for outcomes we don’t truly desire.

These aren’t paperclip maximizers; these are big, slow bureaucracies that gradually erode civilization. Like the proverbial frog in boiling water, humanity might not recognize the gravity of its situation until it’s too late. This represents a long-term, endemic danger of AI — one that demands our constant vigilance.

The Challenge of Solutions

Concerningly, I see no clear solution to this problem. As a species, we struggle to reach consensus on our values. How, then, can we ensure that the AI-enhanced systems permeating society optimize for what we genuinely want? Moreover, once we wake up to this issue, intervention might prove impossible, as these AI systems would be too powerful and pervasive to curtail.

The only silver lining might be time: we must first successfully create AGI without ending the world before “going out with a whimper” becomes our primary concern.

Hope for the Future

Though Christiano’s analysis paints a potentially bleak future, we must remember what success could look like. I’m reminded of Dario Amodei’s “Machines of Loving Grace” when envisioning a positive outcome. Picture a world where disease is relegated to history books, where famine and extreme poverty are distant memories, where everyone can pursue truly fulfilling lives.

If we can get this right — and that’s a significant “if” — the world could become a remarkably better place. The challenge lies in ensuring we navigate the path to that future without losing our way, either with a bang or a whimper.