rule based or consequentialist; explicitly defining rules / values so the agent acts beneficial; problem: what are those values and how to we define these abstract concepts in code? – "Everything is vage to a degree you do not realise until you have tried to make it precise." Bertrand Russel
severely limiting an agents ambitions and activities via its motivation system
indirect value / rule specification; specifying a process rather than the concrete normative standard
starting with a human / benevolent motivation system and enhancing cognitive capacity so it becomes superintelligent
01 Predictability through design goal system set by programmers which the agent will pursue stably; predictions about the developers goals lets predict agents goals
02 Predictability through inheritance when create from a human template an agent might inherent its motivations system
03 Predictability through convergent instrumental reasons considering instrumental reasons that arise for a wide range of final goals in a wide range of situations; more useful as agent gets smarter; smarter agent would be more likely to discover instrumental reasons for its goals
––––––