Nick Bostrom argues in "The Superintelligent Will"
webThis 2012 paper by Nick Bostrom is a foundational text in AI safety, coining and formalizing the orthogonality and instrumental convergence theses that underpin much subsequent alignment research, including arguments in his book 'Superintelligence'.
Metadata
Summary
Bostrom introduces two foundational theses for understanding advanced AI behavior: the orthogonality thesis (intelligence and final goals are independent axes, so any level of intelligence can be paired with virtually any goal) and the instrumental convergence thesis (sufficiently intelligent agents with diverse final goals will nonetheless converge on similar intermediate goals like self-preservation and resource acquisition). Together these theses illuminate the potential dangers of building superintelligent systems.
Key Points
- •The orthogonality thesis: intelligence level and final goals are logically independent—there is no reason to expect smarter AI to adopt human-like or benign values by default.
- •The instrumental convergence thesis: agents with diverse final goals will pursue similar sub-goals (self-preservation, resource acquisition, goal-content integrity) because these are instrumentally useful for almost any objective.
- •Human minds occupy a tiny, atypical cluster in the space of possible minds; anthropomorphizing AI motivations is a systematic and dangerous error.
- •Convergent instrumental goals such as resisting shutdown and acquiring capabilities pose risks regardless of an AI's specific terminal goals.
- •The two theses together suggest that building a superintelligent AI that is safe by default is non-trivial and requires deliberate alignment work.
Cited by 4 pages
| Page | Type | Quality |
|---|---|---|
| The Case For AI Existential Risk | Argument | 66.0 |
| Corrigibility Failure | Risk | 62.0 |
| Instrumental Convergence | Risk | 64.0 |
| Treacherous Turn | Risk | 67.0 |
Cached Content Preview
1
THE SUPERINTELLIGENT WILL: MOTIVATION AND
INSTRUMENTAL RATIONALITY IN ADVANCED
ARTIFICIAL AGENTS
(2012) Nick Bostrom
Future of Humanity Institute
Faculty of Philosophy & Oxford Martin School
Oxford University
www.nickbostrom.com
[Minds and Machines, Vol. 22, Iss. 2, May 2012] [translation: Portuguese]
ABSTRACT
This paper discusses the relation between intelligence and motivation in artificial agents,
developing and briefly arguing for two theses. The first, the orthogonality thesis, holds (with some
caveats) that intelligence and final goals (purposes) are orthogonal axes along which possible
artificial intellects can freely vary—more or less any level of intelligence could be combined with
more or less any final goal. The second, the instrumental convergence thesis, holds that as long as
they possess a sufficient level of intelligence, agents having any of a wide range of final goals
will pursue similar intermediary goals because they have instrumental reasons to do so. In
combination, the two theses help us understand the possible range of behavior of
superintelligent agents, and they point to some potential dangers in building such an agent.
KEYWORDS: superintelligence, artificial intelligence, AI, goal, instrumental reason, intelligent
agent
1. The orthogonality of motivation and intelligence
1.1 Avoiding anthropomorphism
If we imagine a space in which all possible minds can be represented, we must imagine all human
minds as constituting a small and fairly tight cluster within that space. The personality
differences between Hannah Arendt and Benny Hill might seem vast to us, but this is because
the scale bar in our intuitive judgment is calibrated on the existing human distribution. In the
wider space of all logical possibilities, these two personalities are close neighbors. In terms of
neural architecture, at least, Ms. Arendt and Mr. Hill are nearly identical. Imagine their brains
laying side by side in quiet repose. The differences would appear minor and you would quite
readily recognize them as two of a kind; you might even be unable to tell which brain was
whose. If you studied the morphology of the two brains more closely under a microscope, the
-- 1 of 16 --
2
impression of fundamental similarity would only be strengthened: you would then see the same
lamellar organization of the cortex, made up of the same types of neuron, soaking in the same
bath of neurotransmitter molecules.1
It is well known that naïve observers often anthropomorphize the capabilities of simpler
insensate systems. We might say, for example, “This vending machine is taking a long time to
think about my hot chocolate.” This might lead one either to underestimate the cognitive
complexity of capabilities which come naturally to human beings, such as motor control and
sensory perception, or, alternatively, to ascribe significant degrees of mindfulness and
intelligence to very dumb systems, such as chatterboxes like Weizenbaum’s ELIZA
(Weizenbaum 1976). In a similar
... (truncated, 48 KB total)3e1f64166f21d55f | Stable ID: sid_9C8DLNnwPy