AI Alignment Forum

blog

Credibility Rating

3/5

Good(3)

Good quality. Reputable source with community review or editorial standards, but less rigorous than peer-reviewed venues.

Rating inherited from publication venue: Alignment Forum

Data Status

Not fetched

Cited by 13 pages

Page	Type	Quality
Capabilities-to-Safety Pipeline Model	Analysis	73.0
AI Compounding Risks Analysis Model	Analysis	60.0
Mesa-Optimization Risk Analysis	Analysis	61.0
Pre-TAI Capital Deployment: $100B-$300B+ Spending Analysis	Analysis	55.0
Worldview-Intervention Mapping	Analysis	62.0
Alignment Research Center	Organization	57.0
Conjecture	Organization	37.0
Google DeepMind	Organization	37.0
Machine Intelligence Research Institute	Organization	50.0
Dario Amodei	Person	41.0
Eliezer Yudkowsky: Track Record	--	61.0
Paul Christiano	Person	39.0
Longterm Wiki	Project	63.0

Cached Content Preview

HTTP 200Fetched Feb 26, 202634 KB

[Home](https://www.alignmentforum.org/)[Library](https://www.alignmentforum.org/library)[Questions](https://www.alignmentforum.org/questions)[All Posts](https://www.alignmentforum.org/allPosts)

[About](https://www.alignmentforum.org/about)

[An Orthodox Case Against Utility Functions](https://www.alignmentforum.org/posts/A8iGaZ3uHNNGgJeaD/an-orthodox-case-against-utility-functions)

[Best of LessWrong 2020](https://www.alignmentforum.org/bestoflesswrong?year=2020&category=ai%20safety)

Abram argues against assuming that rational agents have utility functions over worlds (which he calls the "reductive utility" view). Instead, he points out that you can have a perfectly valid decision theory where agents just have preferences over events, without having to assume there's some underlying utility function over worlds.

![](https://res.cloudinary.com/lesswrong-2-0/image/upload/f_auto,q_auto/v1/mirroredImages/splashArtImagePrompta%20human%20mind%20shaped%20puzzle%20with%20some%20puzzle%20pieces%20labeled%20as%20different%20emotions/mbz9gjrhxkf8epjkxknu)

13Vanessa Kosoy

In this post, the author presents a case for replacing expected utility theory with some other structure which has no explicit utility function, but only quantities that correspond to conditional expectations of utility.

To provide motivation, the author starts from what he calls the "reductive utility view", which is the thesis he sets out to overthrow. He then identifies two problems with the view.

The first problem is about the ontology in which preferences are defined. In the reductive utility view, the domain of the utility function is the set of possible universes, according to the best available understanding of physics. This is objectionable, because then the agent needs to somehow change the domain as its understanding of physics grows (the ontological crisis problem). It seems more natural to allow the agent's preferences to be specified in terms of the high-level concepts it cares about (e.g. human welfare or paperclips), not in terms of the microscopic degrees of freedom (e.g. quantum fields or strings). There are also additional complications related to the unobservability of rewards, and to "moral uncertainty".

The second problem is that the reductive utility view requires the utility function to be computable. The author considers this an overly restrictive requirement, since it rules out utility functions such as in the procrastination paradox (1 is the button is ever pushed, 0 if the button is never pushed). More generally, computable utility function have to be continuous (in the sense of the topology on the space of infinite histories which is obtained from regarding it as an infinite cartesian product over time).

The alternative suggested by the author is using the Jeffrey-Bolker framework. Alas, the author does not write down the precise mathematical definition of the framework, which I find frustrating. The linked article in the Stanford Encyclopedia of Philo

... (truncated, 34 KB total)

Resource ID: 2e0c662574087c2a | Stable ID: M2JjNzZhZT