Avoiding bad value drift is as important as solving value alignment

August 8, 2018

Written tractatus style: a sublist justifies the point immediately above it.

Note: this argument might, deep down, actually be a reductio for folk notions of human value.

Avoiding bad value drift is as important as solving value alignment.
1. Bad value drift is possible.
  1. Value drift is possible.
    1. Human values are a function of the contents or structure of human minds, and human minds can be altered in a way that changes human values.
  2. Value drift could occur in several plausible ways.
    1. Value drift could occur due to persuasion, propaganda, or warfare that lead to changes in the composition of human society and its beliefs. Narrow AI will accelerate this.
    2. Value drift could occur by use of neuromodulation technology, like a lithiated water supply, nootropics, or brain interfaces. The economic advantages to using such technologies will drive their rapid adoption.
    3. Value drift could occur by genetic alteration to human minds via synthetic biology.
  3. Value drift can result in bad values.
    1. Only unintentional value drift can result in bad values. Intentional value drift cannot, inasmuch as intentional value changes are aligned with current human values.
    2. Unintentional value drift that results in bad values is possible.
2. Bad value drift will plausibly occur before strong AI is built.
  1. Technologies that lead to strong AI may also lead to any of the items listed in 1.1.2 occuring first.
3. Allowing bad value drift to occur before building strong AI is tantamount to failing at value alignment.
  1. If strong AI is built before value alignment is solved, then value alignment has failed by definition.
  2. If strong AI is built after value alignment is solved, but also after bad value drift has occurred, the resulting AIs won’t possess current human values. The AIs will possess bad values, which means having failed at value alignment.

Definitions:

Bad value drift: value drift that changes human values into ones that conflict with our current human values.
Value drift: a change in human values from whatever they are currently.
Human values: what we our species ultimately, collectively wants. This definition is so vague as to be almost useless, but I haven’t found a better one.
Current human values: human values circa 2018.
Value alignment: the task of building AIs that behave according to current human values.