From: Superintelligence

Different kinds of value-loading techniques

Explicit representation Explicitly formulating utility functions as code into a system as complete representations of specific goals. Seems only promising for very simple straight forward goals and might be applicable for domesticity values.

Evolutionary Selection Evolution can be understood as a kind of search algorithm. Might seem promising at first due to the fact that evolution obviously worked once, though high risk that powerful search will find a solution that satisfies formally specified search criteria but not implicit expectations. Other issue: Nature is great with experimentation but highly unethical. Mind crime might be a possible outcome along the way as well as many unintended variations from experimentation.

Reinforcement Learning AI is trained to maximise some notion of reward. Often involve the creation of a system maximising reward signals which has a tendency to produce wireheading (Wireheading is the artificial stimulation of the brain to experience pleasure, usually through the direct stimulation of an individual's brain's reward or pleasure center with electrical current. It can also be used in a more expanded sense, to refer to any kind of method that produces a form of counterfeit utility by directly maximizing a good feeling, but that fails to realize what we value. https://wiki.lesswrong.com/wiki/Wireheading) failure modes as the system becomes more intelligent.

Value Accretion Humans acquire specific goal content from reactions to experience. Could be used to create an agent with human motivation but human value-accretion dispositions might be complex to replicate in a seed ai. bad approximation might lead to ai that generalises differently than humans and acquires unintended final goals. precision is essential to make value accretion work. might be too close to humans and only applicable for whole brain emulation intelligence scenarios.

Motivational Scaffolding A seed ai is given simple goals at first which will later be replaced by more complex ones as it gets more intelligent and develops sufficiently sophisticated representational resources. Goal replacement will be an issue since the ai might inherently refuse to have its scaffold goals tempered with since to it those are final goals. self replacing scaffold goals might be an option: seed ai gets sole final goal of replacing itself with a different final goal. "The approach might hold considerable promise."

Value Learning Ai learns the values humans want it to pursue. Formally specified reference that points to the relevant external information about human values. goal itself doesn't change in the process, ai adapts believes about final goal as it learns. criterion for ai picks out implicit suitable set of values; ai then acts according to its best estimate about these values and adapts as it gets smarter and learns more about the world and slowly unpack the implications of the criterion.

Emulation Modulation Starting with brain emulations with "normal" human motivation ans modifying their motivation system using digital drug analogs or other means. open if this is suitable for loading values with sufficient precision; ethical constraints aside.

Institution Design Some intelligent system consist of intelligent parts that are themselves capable of agency (compare states and countries in the human world). motivation depends on motivation of subagents and organisation between them. could also be used as augmentation for other value loading techniques. internal review agents as subpart of institution might create check and balances –> "a demented kin who reigns over an incompetent court that oversees a mediocre administration which governs capable people" Bostrom: "worthy of further exploration"

Other:

The Debate Game

AI safety via debate