I suspect these 30 minutes of basics are enough to get AI Risk
Delegation and Destruction, by Francis Fukuyama
The other kind of fear, which I always had trouble understanding, was the “existential” threat AI posed to humanity as a whole. This seemed entirely in the realm of science fiction. I did not understand how human beings would not be able to hit the “off” switch on any machine that was running wild. But having thought about it further, I think that this larger threat is in fact very real, and there is a clear pathway by which something disastrous could happen.
All human organizations need to delegate authority to subordinate levels of a hierarchy...
This creates a dilemma for all organizations: they need to delegate authority, but in doing so they risk losing control of agent behavior. There is no way of maximizing both agent autonomy and central control; it is a tradeoff that needs to be made in the design of the organization.
So let’s transpose this framework to agentic AI...
An AI with autonomous capabilities may make all sorts of dangerous decisions, like not allowing itself to be turned off, exfiltrating itself to other machines, or secretly communicating with other AIs in a language that no human being can understand. We can assert now that we will never allow machines to cross these red lines, but incentives for allowing AIs to do so will be very powerful. Supposing we enter a conflict with China, and the Chinese find a way of shutting down our most powerful AIs. Would we definitely forbid our machines from protecting themselves from this kind of attack, in ways that would make it impossible for us to later de-program them? What if we lose the master password to the kill switch? Even in the absence of international competition, competition between existing Western firms may produce similar results.
Change my mind? What’s missing?