If and when artificial superintelligence will be achieved is a speculative question one can only make estimates about. Those range from "in 5 years" to "probably never" and settle in average around a timeframe of 50 years. So there has been the argument floating around that the variable timeframe is reason enough to neglect the urgency of developing strategies to ensure the beneficial behavior of artificial agents. This argument of not working on concepts only because potential consequences seem far in the future is a rather weak one since A) Careful planning, preparation and bounded experimentation are things that take a lot of time and as well as regulative normation does B) Ethical concepts created for ensuring beneficial behavior of highly intelligent entities can positively impact and benefit more short-term AI endeavors and C) The potential dangers of systems that far exceed our own cognitive capabilities seem - if unmanaged - far to great to be ignored.

Arguments like those of the prominent loud voice of Stephen Pinker claiming "a system that intelligent will figure out what we mean and thereby not take any existential risks because otherwise it would not be very intelligent" and "if humans are smart enough to create a system of that level of intelligence they will be smart enough to design it in a safe way" seem more like an arrogant overestimation of human rationality and cognitive capabilities – as history shows we are not that great at making smart decisions in the face of potentially powerful technology that might give us a strategic decisive advantage over potential adversaries and tend to consider consequences only in in hindsight. The intelligence argument seems compelling to believe but one that we hardly want to rely on when it comes to the potential future of humanity as a whole.

The uncertain time variable is in part due to the unpredictability of technological evolution, it is unclear whether there will be any major breakthroughs in machine learning concepts in the next couple of decades and whether they are even required or if improvements in computing power (in small or dramatic ways for example by harvesting the potentials of quantum computing technology) and more / better interpreted datasets will be enough to drive these increases in "intelligence" since there has been constant progress over the last couple of years without any major breakthroughs but rather constant improvement of legacy concepts reused or recombined.

The following will built on two premises: I) Active work on ensuring beneficial behavior of artificial entities is required in order to A) minimize potential risks (from short-term examples like biased / racist decision making based on biased data to long term threats like perverse instantiations of final goals leading to issues like broad resource acquisition) and B) maximize chances like predicting catastrophes, curing diseases to a better understanding of the human existence and overall improvement in life quality via changes in social structures. II) The development of AI will not slow down in the future and constant progress is made in this field of research likely leading to systems with the cognitive performance of human beings (Artificial General Intelligence) and – absent defeaters – far beyond (Artificial Superintelligence), since there is little reason to speculate that human level intelligence will be anything but a loose range along the trajectory of intelligence development. The concept of creating human level intelligence seems more like a cloudy reference point, hard to quantify and questionable in its desirability – creating a type of intelligent agent that helps to tackle the issues humans aren't capable of seems more desirable than recreating human cognitive performance that we already have.

On the road to intelligence of higher magnitudes different approaches have been proposed from emulating human brains (whole brain emulation), enhancing the human brain via selective / gene edited breeding (biological cognition), improving cognitive performance via extending the human brain with technological artifacts (brain computer interfaces), linking multiple, not necessarily highly intelligent individuals to create a collective system greater than the sum of its parts (networks and organisations) to the most prominent and seemingly most promising approach: artificial intelligence – systems that might be inspired by nature but are non biological and mostly based on algorithmic approaches. But no matter whether one of those or a totally different concept will lead to levels of higher intelligence the creator of the project that gets there first and the climate it is created in will be decisive factors in defining the degree of beneficiality, the existence / speed of a possible race dynamic and the overall safety of the project.

The type of actor and its internal structure of regulation and mindset will, abide strong institutional regulations, define possible outcomes of a singularity and who's going to benefit from it. Different scenarios seem possible from free market organisms (seems to be the most likely development as of now looking how far ahead projects from companies like Google are compared to the competition), state controlled programs, likely in places where economy and state are closely intertwined to scientific institutions, which probably wont be independent actors but at least partially funded by one of the former. Of course mixed kinds of actor setups seem possible.

The degree of race dynamics will also largely depend on the takeoff speed from a roughly human level of intelligence to an entity far surpassing that. This takeoff speed, is a strategic unknown that decides over the time we have to prepare for a possible intelligence explosion. Here opinions range from "probably a few hours" after reaching AGI to "multiple decades". Since it is nearly impossible to predict whos right and this period will be crucial for the future ensurance of benevolent outcomes ideas and concepts have to be developed in advance.

Levels of intelligence that exceed those of humans can be classified in different categories of surpassing human level capability, speed superintelligence can be defined as a system that can perform an equivalent kind of tasks to humans but with a far superior speed, a collective superintelligence is a system composed of a large number of smaller intellects such that the systems overall performance across many very general domains vastly outstrips that of any current cognitive system like humans and quality superintelligence is defined as a system thats cognitive speed is at least as fast as a human but with way higher capabilities – note that those categories are not to be seen as strictly separate but can be synthesized: Likely a system that is way smarter than a human will also find ways to run those computations at a higher level of speed. Not only can such systems be classified by peaking domain but also by the way they are allowed to make decisions and impact the world around them. Following Nick Bostrom in his 2011 book "Superintelligence" he distinguishes four types of intelligent entities: Oracles (question answering machines, for safety reasons answers could be limited to simple yes/no types), Genies (command execution systems that take one high level request, execute it and then wait for new orders, Tool AIs (a system that is designed not as an acting agent but rather a tool with limited to no decision power) and Sovereigns (system for open-ended autonomous operation that is capable of pursuing long range objectives). Which type of approach seems feasible will depends on the unique qualifications required and amount of impact and control desired. When developing those systems critical decisions about their design need to be considered like the short and longterm goals a highly intelligent agent is supposed to pursue and the values required to pursue them in a beneficial way (note: beneficial outcomes as well as a beneficial process of pursuing those outcomes), the amount of human ratification in the decision making process of such a system as well as the kind of theory an agent uses to make those decisions in the first place as well as the kind of epistemology used to gain knowledge of the surrounding world in order to get the right kind and amount of information required to make informed rational decisions in the first place.

In order to pursue goals that fulfill the requirement of being seen as broadly beneficial it is essential to figure out how to define those values, implement them into the system and and ensure they are pursued reliably as intended. The implemented models of ethics will largely depend on the actor running the project and therefore implementing them so awareness for ethical behavior is established in the corporate culture as early as possible is important. Moreover something like an independent overarching ethics council might be required that has to check those actors projects for a standard of safety and accordance to a broader beneficial.

When approaching the goal of beneficiality a broad debate around what is beneficial and the values transparent communication seems important in order to include those impacted at least partially in the process and establish trust. One of the major challenges in defining those values will be the ensurance of beneficiality from different points of view and degrees of impact – from a global set of overarching norms to individual and case specific desires according to ever changing use-cases. Values can be seen as multidimensional structures case specific to the individuals impacted but overlapping with more broad societal ideas regulating out living together. If our current models of ethics proof to be applicable to define values for a non human entity remains to be seen but they might at least provide a helpful starting point for early experiments in order to approach those "ideal morals".

When looking for ways to define those values there are different approaches, from strictly defining a set of rules to be used as base for decision making (translating moral norms into an input a machine can use for rational decision making might proof very difficult) to a concepts like indirect normativity. Here an agent is given the goal to act according its best estimate of an implicitly defined criterion (e.g. concepts like "right" / "good" / "bad") an unpack the criterions meaning as it learns more about our preferences. Another approach, called "value extrapolation theory", proposes giving a seed ai the goal to pursue humanities coherent extrapolated volition (cev) – this being our wish if we knew more, thought faster, were more the people we wished we were, had grown further together, where the extrapolation converges rather than diverges, where our wishes cohere rather than interfere, extrapolated as we wish that extrapolated, interpreted as wish that interpreted – and again adapt its knowledge around humanities cev.

Implementing those defined moral codes into a system so it acts according to their intentional meaning is the next important step, here techniques like reinforcement learning might be used to teach our wishes via reward concepts, which is one of multiple ideas to create sophisticated motivations for a system to pursue its final goals. Other concepts try to focus on approaches to the issue of aligning those goals and the ones of its creators which seems especially important to get right since as the system gets smarter it might refuse to have its final goal content changed due to this endangering the achievement of said final goal(s). One idea is to start off with a seed ai that is given a simple slice of a more complex goal at first with the final goal of learning more about this larger goal. As it gets smarter and learns more about the world the complete picture of the final goal is slowly unpacked. Ensuring beneficial behavior of artificial agents and the constant alignment of its goals an internal system of checks and balances should be considered where an intelligent system consisting of intelligent parts (internal subagents) with a review function constantly monitors itself for potential security breaches and avoids decisions that might lead to such.

Once selected values have been defined and implemented into a system there have to be concepts to ensure that an agent acts according to those values and follows them as intended without deviating from them and leading to potentially harmful unintended consequences. Different ideas have been proposed to ensure the alignment of goals like A) stunting an agents cognitive abilities and influence on the world (which might render such an agent quite useless since it will require access to the world in some way or form to perform actions and even the smallest access point for a highly intelligent system with malevolent goals might enable it to extend its influence exponentially), B) constant monitoring of a system by human or artificial controllers, C) tripwire reporting systems that perform checks outside the agents realization and shut it down automatically if breaches of security are detected to D) incentive methods that keep a system acting according to our wishes for example via reward functions that are themselves dynamic agents and therefore harder to fall victim to reward hacking than static rewards. It is to note that the ideas presented here are just a small selection of concepts provided by experts and also none of them is fool proof and has its weak-spots that might be easy for a highly intelligent system to exploit.

As mentioned above one way to ensure a system stays on track might be by using applicable motivation methods like social rewards (and therefore punishments) that make a system want to pursue its goals for the sake of e.g. social appreciation and approval. Therefore it would be necessary for a system to either A) understand the meaning of the underlying models of morals that lead to social rewards, which consists of the total of moral norms, emotions, feeling and hopes guiding our cohabitation, or B) at least the effects those abstract concepts have in our society. Since the first option requires a system to be self aware and therefore needs a form of consciousness, the second approach seems more feasible – it is not yet understood why conscious existence is present in humans so it will be hard to replicate this abstract state of being in digital agents.

Furthermore whether a system is conscious or not can be considered relevant when debating its legal status and the rights it posses but non essential for ensuring beneficial behavior since simulating the effects of consciousness on reasoning and decision making seems sufficient. For example it is not necessary for an agent to really "feel" emotions and have phenomenal experience of consciousness if the effect of those emotions can be replicated in order to understand the layout of a situation and make rational decisions based on it. Unclear is whether a system that only posses "access consciousness" (a state is access conscious if it can be used for rational reasoning, decision making and control of actions which have to be justifiable) can be intrinsically motivated to pursue its actions or if any type of motivation system will be based on external ethical models implemented into the agent. If an artificial system without phenomenal consciousness can be considered a full moral actor is questionable since it is unclear if any type of rational behavior can be attributed to its intrinsic wishes and desires (rational actions can be defined as a combination of desires, intention and an opinion about a certain situation). If such a system is viewed as capable of having wishes is open for debate and depends on whether those are seen as mental states related to subjective experiences or functional expressions that can be realized in different ways.

Crucial to the definition, implementation and ensurance of different concepts of beneficiality is that an agent is capable of detecting explicit morally relevant information, able to process this input and make decisions based on value concepts and considerations – also in unforeseen situations. This process must be transparent and justifiable by the system since those decisions might be the base for accountability decisions and the detection of potentially flawed concepts in early stages. An entity capable of performing moral actions without being a full moral actor and therefore not accountable creates a vacuum of accountability that needs to be filled. The decision models used by a system need to have valid answers on how to deal with cases of uncertainty and overlapping moral issues and who will be responsible for actions in those cases. There will without a doubt be moral edge cases where a system will have to choose between to bad choices and pick the lesser evil. If and how affected variables are quantifiable remains an open issue that requires sensible concepts (for example following the ethics of utilitarianism the lesser evil is the one that affects less people, according to Kants deontological ethics the reduction of damage is an imperfect duty that cannot be entrusted to machines).