Artificial intelligence as a positive and negative global risk factor

Lecture



Until now, the main danger of artificial intelligence (AI) was that people were too early to conclude that they understand it. Of course, this problem is not limited to AI. Jacques Monod writes: "The funny aspect of the theory of evolution is that everyone thinks he understands it." (Monod, 1974.) My father, a physicist, complained about people inventing their own physical theories: “is it interesting to know why people do not invent their own theories of chemistry?” (But they do.) However, the problem is particularly relevant in field of AI. The AI ​​field has a reputation for making huge promises and not fulfilling them. Most observers believe that AI is difficult, and in fact it is. But confusion does not come from difficulty. It is difficult to make a star from hydrogen, but stellar astrophysics does not have a terrifying reputation to promise to make a star and then not be able to. The critical conclusion is not that AI is difficult, but that, for some reason, it is very easy for people to think that they know much more about Artificial Intelligence than they really are.


In my other article on the risks of global catastrophe, “Systematic errors of thinking that potentially affect judgments about global risks,” I begin by noting that few people would prefer to destroy the world on purpose; the scenario of destroying the Earth by mistake seems to me very disturbing. Few people will press a button that they know will cause a global catastrophe. But if people tend to be absolutely sure that a button does something completely different from its actual operation, this is really a cause for alarm.

It is much more difficult to write about the global risks of artificial intelligence than about systematic thinking errors. Mistakes of thinking are firmly established knowledge; enough to quote the literature. AI is not a well-established knowledge; AI refers to advanced research, not textbooks. And, for reasons explained in the next chapter, the problem of global risks in connection with artificial intelligence is not actually discussed in the existing technical literature.

Footnote (1): I thank Michael Roy Ames, Eric Baum, Nick Bostrom, Milan Cirkovic, John K Clark, Emil Gilliam, Ben Goertzel, Robin Hanson, Keith Henson, Bill Hibbard, Olie Lamb, Peter McCluskey and Michael Wilson for their comments , suggestions and criticism. Needless to say, any remaining errors in this article are mine.

I have to analyze the topic from my point of view, draw my own conclusions and do everything I can to prove them in the limited space of this article.
The point is not that I neglect the need to cite existing sources on this topic, but that such sources, despite all my attempts to find them, could not be found (as of January 2006).

It is tempting to ignore AI in this book because this is the most difficult topic to discuss. We cannot turn to statistics to calculate the small annual probability of a catastrophe, as in the case of asteroid impacts. We cannot use calculations based on exact, accurately confirmed models to exclude certain events or to establish infinitesimal upper bounds of their probabilities, as in the case of possible physical disasters. But it makes disaster with AI even more disturbing, and no less.

The effects of systematic thinking errors, as it turned out, tend to increase with a lack of time, a busy mind, or a lack of information.

Artificial intelligence as a positive and negative global risk factor

1. Systematic error associated with anthropomorphism.

When something is very widespread in our daily life, we take it for granted, to the point that we forget about the existence of it. Imagine a complex biological adaptation, consisting of 10 necessary parts.

If each of the 10 genes is independent and has a 50% frequency in the set of genes - that is, each gene is present in only half of the specimens of the species - then, on average, only one specimen out of 1024 will have a fully functional adaptation. The fur coat is not a significant evolutionary acquisition until the environment begins to subject organisms to cold selection. Similarly, if gene B is dependent on gene A, then gene B does not have a significant advantage, until gene A becomes a reliable part of the genetic environment. A complex, interdependent device must be present in all sexually reproducing species; it cannot develop otherwise. (Tooby and Cosmides, 1992.) One robin may have smoother feathers than the other, but both must have wings. Natural selection, driven by diversity, narrows this diversity. (Sober, 1984.) In every known culture, people experience sadness, disgust, rage, fear, and surprise (Brown, 1991), and convey these emotions with the same facial expressions. We all have the same engine under the hood, although we can be painted with different colors; This principle is what evolutionary psychologists call the mental unity of humanity. (Tooby and Cosmides, 1992). This description is explained and required by the laws of evolutionary biology.

An anthropologist will not enthusiastically write about a new-found tribe: “They eat food! They breathe air! They use tools! They tell stories to each other! ”We humans forget how we are like each other, living in a world that only reminds us of our differences.

People learned to model other people - to compete and cooperate with their relatives. It was a reliable tool in the world of our ancestors, where any strong mind that you came across was also human. We have developed the ability to understand our neighbors by empathy, putting ourselves in their place; for this, what is modeled should be similar to modeling. It is not surprising that people often humanize - that is, they expect human-like qualities from what is not human. In the film “The Matrix” (Wachowski brothers, 1999), the representative of artificial intelligence, Agent Smith, at first seems completely cold and collected, his face is motionless and unemotional. But later, by interrogating Morpheus’s man, Agent Smith gives way to his own aversion to humanity - and his face expresses a universal expression of disgust. Interrogating your own mind works well, as an adaptation instinct, when you need to predict other people.

But if you explore some other optimization process - if you are, for example, an 18th century theologian William Paley, then anthropomorphism is a sticky fly for careless scientists, such a sticky trap that Darwin is needed to get out of it.

Experiments on the study of anthropomorphism have shown that subjects often anthropomorphize unconsciously, contrary to their basic guidelines. Barrett and Keil (1996) conducted experiments on subjects who professed faith in the non-anthropomorphic qualities of God — that God can be in more than one place at the same time, or watch many things at the same time. Barrett and Keil offered stories to these test subjects in which God saves people from drowning. The subjects answered questions about the stories or retold them in their own words, in such a way that it suggested that God was only in one place at one time and performed the tasks sequentially rather than in parallel. Fortunately for the purposes of our research, Barrett and Keil in another group used similarly other stories about a supercomputer named "Uncomp". For example, in order to portray the property of omniscience, it was said that Uncomp sensors cover every square centimeter of the earth, and no information is lost. Subjects under these conditions still demonstrated strong anthropomorphism, although significantly less than in the “group of God.” From our point of view, the main result is that, although people deliberately thought the AI ​​was not like a man, they still imagined such scenarios as if the AI ​​were human-like (though not as human-like as God).

The error of anthropomorphism sneaks up unnoticed: it occurs without deliberate intent, unconsciously and contrary to obvious knowledge.

In the era of tabloid science fiction, magazine covers were often depicted as a monster alien — collectively known as the beetle-eyed monster (YGM) —tangling an attractive semi-nude woman. It may seem that the artist believed that a non-humanoid alien, with a completely different evolutionary history, could sexually desire a woman-man. Such errors are not due to the fact that people explicitly argue like the following: "All minds are most likely excited in a similar way, and therefore, it is likely that IGM finds a woman-man sexually attractive." Rather, the artist simply did not ask whether the giant beetle perceives female humans to be attractive. On the contrary, the half-naked woman is sexual - initially, because this is an inherent property of her. Those who make this mistake do not think of the mind of an insect being; they concentrate on the battered clothes of a woman. If the clothes were not pulled up, the woman would be less sexy, but the CSG does not understand this. (This is a special case of a deep, confusing, and extremely common mistake that ET Jaynes called fallacy related to mental projection (mind projection fallacy). (Jaynes and Bretthorst, 2003.) Jaynes, a specialist in Bayesian theory of confidence, defined “the fallacy associated with mental projection "as an error due to the fact that the states of knowledge are confused with the properties of objects. For example, the phrase" mystical phenomenon "implies that mysticism is a property of the phenomenon itself. If I am ignorant of a certain phenomenon, then this is about my state of mind, rather than the phenomenon itself.)

People do not need to understand that they are anthropomorphizing (or at least understand that they are involved in the dubious act of predicting the state of someone else's mind) in order for anthropomorphism to influence thinking. When we try to talk about someone else's consciousness, each step of reasoning can be combined with assumptions so obvious to human experience that we pay no more attention to them than to air or gravity. You object to the journal illustrator: “Isn't it more believable that a huge male beetle will sexually desire huge female beetles?” The illustrator will think a little and say: “But even if alien insectoids started with love for solid exoskeleton, after an insectoid will meet a woman-man, he will soon realize that she has much softer and delicate skin. If aliens have a sufficiently advanced technology, they can genetically change themselves to love soft skin, not solid exoskeletons. ”

This is fallacy-at-one-remove. After the alien's anthropomorphic thinking is indicated, the journal illustrator takes a step back and tries to present the alien’s conclusions as a neutral product of his thinking. Perhaps advanced aliens may rebuild themselves (genetically or otherwise) to like soft skin, but will they want it? An alien insectoid who loves hard skeletons will not want to remake himself to love soft skin instead - except if natural selection somehow creates in him a definitely human sense of metasexuality. When using long, complex chains of reasoning in support of anthropomorphic conclusions, each step of such reasoning is another opportunity for an error to creep.

And another serious mistake is to start with a conclusion and look for a seemingly neutral line of reasoning leading to it; this is called rationalization. If the first thing that comes to mind, when asked about this topic, is the image of an insectoid pursuing a woman-person, then anthropomorphism is the root cause of this perception, and no amount of rationalization will change that.

Anyone who would like to reduce the systematic error of anthropomorphism in itself would have to study evolutionary biology for practice, preferably, evolutionary biology with mathematical calculations. Early biologists often humanized natural selection — they believed that evolution would do the same thing that they themselves would do; they tried to predict the effects of evolution by putting themselves in its place. The result was mostly nonsense, which began to be banished from biology only in the late 1960s, for example, it was done by Williams (1966). Evolutionary biology offers training on the basis of both mathematics and concrete examples that help to beat out the humanization error.


1.1: The breadth of the space of possible devices of the mind. (The width of mind design space).


Evolution rigidly retains some structures. To the extent that the development of other genes relies on a previously existing gene, this early gene is fully cemented: it cannot mutate without disturbing many forms of adaptation. Homeotic genes - genes that control the development of the structure of the body of the embryo - tell many other genes when to activate. A mutation in a homeotic gene can lead to a normal development of a fruit fly embryo, except that it will not have a head. As a result, homeotic genes are so precisely preserved that many of them are the same in humans and fruit flies - they have not changed since the last common ancestor of humans and insects. The molecular mechanisms of ATP synthesis are essentially the same in animal mitochondria, plant chloroplasts and bacteria; ATP synthesis has not undergone significant changes since the development of eukaryotes 2 billion years ago.

Any two AI devices may be less similar to each other than you and a garden petunia flower.

The term AI refers to a much larger space of opportunity than the term "Homo sapiens". When we talk about different AIs, we talk about minds in general, or about optimization processes in general. Imagine a map of possible mind devices. In one corner, a small circle means all people. And this whole map is inside an even larger space, an optimization process space. Natural selection creates complex functioning mechanisms without involving the process of thinking; evolution is within the space of optimization processes, but outside the space of minds.

This gigantic circle of possibilities excludes anthropomorphism as a legitimate way of thinking.


2: Prediction and device. (Prediction and design).

We cannot ask our own brain about inhuman optimization processes — not about insect-eyed monsters, not about natural selection, or about artificial intelligence. And how are we going to continue? How can we predict what the AI ​​will do? I purposely ask this question in a form that makes it difficult to handle. With such a formulation of the problem, it is impossible to predict whether an arbitrary computing system will perform at least some I / O functions, including, for example, simple multiplication (Rice, 1953.) So how is it possible that computer engineers can create chips that reliably perform calculations ? Because people-engineers purposely use those projects that they can understand.

Anthropomorphism makes people believe that they can make predictions, having no other information than about the fact of “intelligence” of something - anthropomorphism continues to generate predictions, regardless of anything, while your brain automatically puts himself in the place of this very "intellectuality". This may be one of the factors in the confusing history of AI, which comes not from the difficulty of AI as such, but from the mysterious ease of finding the mistaken belief that some given AI design will work.

In order to make a statement that the bridge can support a car's weight of 30 tons, civilian engineers have two weapons: the choice of initial conditions and the safety margin. They do not need to predict whether an arbitrary construction can withstand a weight of 30 tons, but only the design of this particular bridge, for which they are making this statement. And although this shows from the best side of an engineer who can calculate the exact weight that the bridge can withstand, it is also acceptable to calculate that the bridge will withstand cars of at least 30 tons - although it may take a lot to prove this vague statement strictly. part of the theoretical understanding that is included in the exact calculation.

Civil engineers adhere to high standards in predicting that bridges will withstand the load. Alchemists of the past adhered to much lower standards in predicting that a sequence of chemicals transforms lead into gold. What is the amount of lead in how much gold? What is the causal mechanism of this process? It is understandable why an alchemist researcher wanted gold more than lead, but why does this sequence of reagents turn lead into gold, and not gold into lead or lead into water?

Early AI researchers believed that an artificial neural network of layers of threshold devices, trained through back propagation, would be “intelligent”. The resulting wishful thinking used in this is closer to alchemy than to civil engineering. Magic is on the list of Donald Brown's human universals (Brown, 1991); science is not. We do not instinctively understand that alchemy does not work. We instinctively do not distinguish between rigorous reasoning and good storytelling. We instinctively do not notice the expectation of positive results, hanging in the air. The human species originated through natural selection, functioning through the non-random conservation of random mutations.

One of the ways to a global catastrophe is when someone presses a button, having an erroneous idea of ​​what this button does - when an AI comes through a similar fusion of working algorithms, with a researcher who does not have a deep understanding of how the whole system works. No doubt, they believe that the AI ​​will be friendly, without a clear idea of ​​the exact process involved in creating friendly behavior, or any detailed understanding of what they mean by friendliness. Despite the fact that early AI researchers had very erroneous, vague expectations about the intelligence of their programs, we can imagine that these AI researchers were able to construct an intellectual program, but they had strongly erroneous vague expectations regarding the friendliness of their programs.

Not knowing how to make a friendly AI is not deadly in and of itself, if you know what you do not know. It is an erroneous belief that AI will be friendly means an obvious path to global catastrophe.


3: Underestimating the power of intelligence. (Underestimating the power of intelligence).


We tend to see individual differences instead of human qualities. Therefore, when someone says the word "intelligence" (intelligence), we think more about Einstein than about people. Individual differences in human intelligence are standardized, known as Spearman's G-factor, a controversial interpretation of the solid experimental evidence that different intelligence tests correlate highly with each other, as well as with results in the real world. such as total income for life. (Jensen, 1999.) Speyer's G-factor is a statistical abstraction of individual differences in intelligence between people who, as a species, are much more intelligent than lizards. Speyer's G-factor is derived from millimeter differences in height among representatives of the species of giants.


We should not confuse the G-factor of Speerman with universal human intelligence, that is, our ability to handle a wide range of mental tasks that are incomprehensible to other species. General intelligence is an interspecific difference, a complex adaptation and a universal quality found in all known cultures. Perhaps there is still no academic consensus about intellectuality, but there is no doubt about the existence, or the strength, of such a thing that should be explained. There is something in humans that allows us to leave traces of shoes on the moon.

But the word “intellectuality” usually evokes images of a starving professor with IQ of 160 units and a billionaire head of a company with an IQ of barely 120. In fact, there are differences in individual abilities besides the qualities from “career books” that influence the relative success in the human world: enthusiasm, social skills, musical talents, rationality. Note that each of these factors is cognitive. Social skills are inherent in the brain, not the liver. And - joking aside - you will not find many heads of companies, or even academy professors, who would be chimpanzees. You will not find many glorified thinkers, no artists, no poets, no leaders, no experienced social workers, no martial arts masters, or composers who are mice. Intellectualism is the foundation of human power, the power that fills our other arts.

The danger of confusing general intelligence with the g-factor is that this leads to a tremendous underestimation of the potential impact of AI. (This refers to the underestimation of potentially good impacts, as well as bad impacts.) Even the phrase “transhumanistic AI” or “artificial superintelligence” can still give the impression of “a box with books how to make a career”: an AI that is really good at cognitive tasks, usually associated with “intellectuality”, like chess or abstract mathematics. But not with superhuman persuasiveness, or with the ability far better than humans, to predict and control human institutions, or inhumanly cleverly in formulating long-term strategies. So, maybe we should think not about Einstein, but about the political and diplomatic genius of the 19th century, Otto von Bismarck? But this is only a small part of the error. The whole spectrum from the village idiot to Einstein, or from the village idiot to Bismarck, decreases to a small point on the segment between the amoeba and the human.


If the word “intellectuality” is associated with Einstein, and not with people, then it may seem sensible to say that intellect has nothing to do with guns, as if guns were growing on trees. It may seem sensible to say that intelligence has nothing to do with money, as if mice used money. Human beings started out without having big assets of teeth, claws, weapons, or any other advantages that were a daily currency for other species. If you look at people from the point of view of the rest of the ecosphere, there was no hint that the soft pink creatures would eventually close themselves into armored tanks. We created a battlefield where we defeated lions and wolves. We did not fight them with our claws and teeth; we had our own idea of ​​what really matters. Such is the power of creativity.

Vinh (Vinge, 1993) pertinently observes that the future in which there are minds that transcend human ones is qualitatively different. AI is not an amazing brilliant expensive gadget advertised in the latest issue of technical journals. AI does not belong to the same graphics that shows progress in medicine, manufacturing and energy. ИИ – это не то, что вы можете небрежно добавить в люмпен-футуристический сценарий будущего с небоскрёбами и летающими машинами и нанотехнологическими красными кровяными клетками, которые позволяют вам задержать дыхание на 8 часов. Достаточно высокие небоскрёбы не могут начать проектировать сами себя. Люди достигли господства на Земле не из-за того, что задерживали дыхание дольше, чем другие виды.

Катастрофический сценарий, произрастающий из недооценки силы интеллекта, заключается в том, что некто создаст кнопку, не достаточно заботясь о том, что эта кнопка делает, потому что он не думает, что эта кнопка достаточно сильна, чтобы повредить ему. Или, поскольку недооценка силы интеллекта ведёт к пропорциональной недооценке силы Искусственного Интеллекта, то (в настоящая время микроскопическая) группа озабоченных исследователей и поставщиков грантов и отдельных филантропов, занимающихся рисками существованию, не будет уделять достаточно внимания ИИ.
Или широкое поле исследований ИИ не будет уделять достаточно внимания рискам сильного ИИ, и в силу этого хорошие инструменты и твёрдые установления для Дружественности окажутся недоступными, когда возникнет возможность создавать мощные интеллекты.

And it should also be noted - because it also affects global risks - that AI can be a powerful solution for other global risks, and by mistake we can ignore our best hope for survival. The statement about underestimating the potential impact of AI is symmetrical with respect to potentially good and potentially bad impacts. That is why the title of this article is “Artificial Intelligence as a Positive and Negative Factor of Global Risk” and not “Global Risks of Artificial Intelligence”. The prospect of AI affects global risks in a more complex way; if the AI ​​were pure interference, the situation would be simpler.

4: Abilities and motives. (Capability and motive).

There is one kind of fallacy that is often found in discussions about AI, especially about the AI ​​of superhuman abilities. Someone says: “When technology moves far enough, we will be able to create intellects that are far superior to humans. Obviously, the size of the cheesecake that you can bake depends on your intelligence. Superintelligence can create giant cheesecakes - cheesecakes, the size of a city - my goodness, the future will be full of giant cheesecakes! ”The question is whether the superintelligence will want to create huge cheesecakes. The vision of the image leads directly from the possibility to the realization, without awareness of the necessary intermediate element - the motive. The following reasoning chains, considered in isolation without supporting evidence, are all examples of the Fallacy of the Giant Cheesecake:

- A strong enough AI can overcome any human resistance and destroy humanity. (And the AI ​​decides to do this.) Therefore, we should not build an AI.
- Strong enough AI can create new medical technologies that can save millions of human lives. (And he decides to do this.) Therefore, we must create an AI.
- When computers become cheap enough, the vast majority of work will be performed by AI more easily than people. A strong enough AI will even be better than us in math, engineering, music, art, and in all the other works that we think are important (AI will decide to do this work.) Thus, after the invention of AI, people will have nothing more to do and we will starve or watch tv.


4.1: Optimization processes. (Optimization processes)

The above analysis of the fallacy of the Giant Cheesecake has an anthropomorphism inherent in it - namely, the idea that motives are separable; the implied assumption that, speaking of "abilities" and "motives", we break the connectedness of reality. This is a convenient cut, but anthropomorphic.
In order to address the problem from a more general point of view, I introduced the concept of an optimization process: a system that hits small targets in a large search space to produce consistent effects in the real world.

The optimization process directs the future to certain regions of the possible. When I visit a remote city, my friend from the locals calls me to the airport. I do not know the neighborhood. When my friend leaves for a crossroads, I cannot predict his turns, either in sequence or separately. But I can predict the result of my friend's unpredictable actions: we will arrive at the airport. Even if my friend's house is located elsewhere in the city, so my friend will have to take a completely different turn sequence, I can predict with the same degree of confidence where we will finally arrive. Is this a strange situation, scientifically speaking? I can predict the outcome of the process, being unable to predict any of its intermediate steps. I will call the area in which the optimization process directs the future, the goal of optimization.

Consider a car, for example, Toyota Carolla. Of all the possible combinations of atoms that make it up, only an infinitesimal part will be a working car. If you collect atoms in random order, a lot
many ages of the universe will pass until you manage to collect the car. A small share of project space describes cars that we could recognize as faster, more efficient and safer than Corolla. Thus, the Corolla is not optimal in terms of the goals of its designer. But the Corolla is, however, optimized, since the designer had to fall into a relatively infinitely small area in the space of possible structures, only to create a working car, not to mention the Corolla quality car. You can’t even build an efficient trolley, sawing boards by accident and knocking them out according to the results of a coin toss. To achieve such a small goal in the configuration space, a powerful optimization process is needed.

The concept of the “optimization process” is predictively useful, since it is easier to understand the purpose of the optimization process than its step-by-step dynamics. The discussion of the Corolla above implicitly assumes that the designer of the Corolla was trying to create a "car", a means of transport. This assumption should be made explicit, but it is not erroneous and it is very useful for understanding the Corolla.


4.2: Targeting. (Aiming at the target.)

There is a temptation to ask what the AI ​​will want, forgetting that the space of minds is generally much larger than a small human point. One should resist the temptation to extend quantitative restrictions to all possible minds. Storytellers wind up fairy tales about a distant and exotic land called the Future, saying what the future should be. They make predictions. They say: "AI will attack people with the help of armies of marching robots" or "AI will invent a cure for cancer." They do not offer a complex relationship between initial conditions and results - so they could lose an audience. But we need an understanding of the relations in order to control the future, directing it to an area that is pleasing to humanity. If you do not steer, we risk to go where it takes us.
The main challenge is not to predict that AI attacks humans with the help of armies of robots, or, alternatively, introduces a cure for cancer. The challenge is not even to make this prediction for an arbitrary AI device. Rather, the challenge is to select and create such an optimization process, whose positive effects can be firmly proven.
I strongly urge my readers not to start inventing reasons why a universal optimization process should be friendly. Natural selection is not friendly, neither hates you, nor leaves you to one. Evolution cannot be so anthropomorphized, it does not work like you.
Until the 1960s, many biologists expected that natural selection would create a complete set of all the good things, and invent all sorts of complicated reasons why he should do this. They were disappointed because natural selection does not begin by itself knowing that they want a pleasant result for a person, and then does not come up with difficult ways to create pleasant results using selection pressure. Thus, events in nature were the results of completely different processes for their reasons than those that occurred to biologists before the 1960s, and therefore predictions and reality differed.
Wishingful thinking (wishful thinking) adds detail, limits predictions, and thus makes it impossible. What about a civil engineering engineer who hopes the bridge won't fall? Should an engineer prove this by saying that bridges do not usually fall? But nature itself does not offer reasonable reasons why bridges should not fall. Rather, it is the engineer who overcomes the burden of improbability through a specific choice, guided by a specific understanding. The engineer begins with the intention to create a bridge. He then uses rigorous theory to choose a bridge design that would support cars. Then builds a real bridge, whose structure reflects the calculated project. As a result, the real structure stands cars. In this way, harmony of predicted positive results and real positive results is achieved.

5: Friendly AI. (Friendly AI).

It would be very cool if humanity knew how to create a powerful optimization process with some particular result. Or, to put it more generally, it would be great if we knew how to create a good AI (nice AI).
In order to describe the area of ​​knowledge necessary to take up this challenge, I proposed the term “Friendly AI”. I refer this term not only to the method itself, but also to its product — that is, to an AI created with specific motivation. When I use the term Friendly in either of these two meanings, I capitalize it to avoid confusion with the usual meaning of the word "friendly."
A typical reaction to this of people, which I often met, was to immediately declare that Friendly AI is impossible, because any sufficiently strong AI can modify its own source code to break any restrictions imposed on it.
The first logical inconsistency that you can point out here is the fallacy of the Giant Cheesecake. Any AI that has free access to its source code, in principle, will have the ability to change its code in such a way that its optimization goal will change. But this does not mean that the AI ​​has the motivation to change its own impulses. I will not deliberately swallow the pill that will make me enjoy the killings, because I really prefer my colleagues - people do not die.
But what if I try to change myself and make a mistake? When computer engineers prove the chip's suitability - which is a good idea, if there are 155 million transistors in a chip, and you cannot release a patch afterwards - the engineers use a formal check that is controlled by a human and checked by machines. A remarkable property of a formal mathematical proof is that the proof of 10 billion steps is as reliable as the proof of 10 steps. But human beings are not credible to follow up on a test of 10 billion steps; we have too high a chance to miss a mistake. Modern theorem proving techniques are not smart enough to design and test a whole computer chip by themselves - modern algorithms experience an exponential growth as the search space increases. People-mathematicians can prove theorems that are much more complex than those that can be mastered by modern proof programs, without being defeated by an exponential explosion. But mathematics people are informal and unreliable; from time to time someone finds an error in the previously adopted informal proof. The way out is that human engineers send evidence programs to intermediate proof steps. The person chooses the following lemma, and the complex theorem proving generates a formal proof, and a simple checker checks the steps. Thus, modern engineers create reliable mechanisms with 155 million independent parts.
Verifying the correctness of a computer chip requires the synergy of human intelligence and computer algorithms, since now neither is enough. Perhaps a genuine AI will use this combination of abilities, when it modifies its own code — it will have the ability to enter bulk projects without being defeated by exponential growth, or the ability to test its steps with high reliability. This is one of the ways in which a true AI can remain knowably stable for its own purposes even after performing a large number of self-patching.
This article will not explain the above idea in detail. (Also see Schmidhuber 2003 for a related topic.) But you should think about this call and study it with the best available technical data before declaring it impossible - especially if large rates depend on the answer. It is disrespectful for human ingenuity to declare a problem insoluble without careful and creative consideration. This is a very strong statement: to say that you cannot do something — that you cannot build a flying machine heavier than air, that you cannot extract useful energy from nuclear reactions, that you cannot fly to the Moon. Such statements are universal generalizations relating to any possible approach to solving this problem that anyone invented or will invent. It takes only one opposite example to disprove the universal oboschenie. The statement that Friendly (or friendly) AI is theoretically impossible, dares to refer to any possible mind devices and any possible optimization processes — including human beings who also have a mind, and many of which are nice (nice) and want to be even better. At the moment there are an unlimited number of vaguely convincing arguments why Friendly AI can be beyond the power of man, and yet it’s much more likely that the problem is solvable, but no one is going to solve it in time. But do not blame the problem too quickly, especially given the scale of the stakes.


6: Technical failure and philosophical failure. (Technical failure and philosophical failure.)


Bostrom (Bostrom, 2001) defines a global catastrophe (existential catastrophe) as one that exterminates *** intelligent life on Earth or irreversibly damages part of its potential. We can divide the potential errors in attempts to create Friendly AI into two informal categories, a technical error and a philosophical error. The technical one is that you are trying to create an AI, and it does not work as it should - you could not understand how your own code actually works. Philosophical failure is to try to build the wrong thing, so even if you succeed, you still cannot help or bestow humanity on anyone. Needless to say, one mistake does not exclude the other.

The boundary between the two cases is thin, since most philosophical mistakes are much easier to explain with technical knowledge. In theory, you must first state what you want and then outline how you achieve it. In practice, deep technical understanding is often required to delineate what you want.


6.1: An example of a philosophical error. (An example of philosophical failure.)

At the end of the 19th century, many honest and intelligent people advocated communism for the best of reasons. The people who were the first to introduce, spread and assimilate the communist idea (meme) were, according to a strict historical account, idealists. The first communists did not have a warning example of Soviet Russia. At the time, with no retroactive knowledge, it should have seemed like a very good idea. After the revolution, when the communists came to power and were poisoned by it, other motives could come into play; but this was not predicted by the first idealists, no matter how predictable this could be. It is important to understand that the author of a huge catastrophe should not be angry or particularly stupid. If we attribute any tragedy about evil or special stupidity, we will look at ourselves, we will find out correctly that we are not evil and not particularly stupid and say: "But this will never happen to us."
The first communists thought that the empirical consequence of their revolution would be that people's lives should improve: workers would no longer work long hours in exhausting work and receive little money for it. It turned out to be not quite so to say the least. But what, in the opinion of the first communists, should have turned out, was not very different from what, in the opinion of supporters of other political systems, should have been the empirical consequence of their favorite political system. They thought that people would be happy. They were wrong.
Now imagine that someone will program a “Friendly” AI to build communism, or libertarianism, or anarcho-feudalism, or any other favorite political-political system, believing that this will bring about utopia. People's favorite political systems generate shining suns of positive emotions, so the proposal will seem like a really good idea to the offeror.

We can see here a programmer mistake on a moral or ethical level — say, as a result of someone trusting himself so highly that he is unable to take into account his own error susceptibility, refusing to consider the possibility that, for example, communism may be mistaken in the final score. But in the language of the Bays solution theory, there is an additional technical look at the problem. From the point of view of decision theory, the choice in favor of communism comes from a combination of empirical faith and value judgment. Эмпирическая вера состоит в том, что введение коммунизма приведёт к определённому результату или классу результатов: люди станут счастливее, работать меньше часов и обладать большим материальным богатством. Это, в конечном счёт, эмпирическое предсказание: даже его часть о счастье относится к реальным состояниям мозга, хотя её трудно измерить. Если вы введёте коммунизм, это результат будет или достигнут, или нет. Ценностное суждение состоит в том, что этот результат удовлетворяет или предпочтителен в текущих обстоятельствах. При другой эмпирической вере о действительных последствиях коммунистической системы в реальном мире, решение может претерпеть соответствующие изменения.

We can expect a genuine AI, Artificial Universal Intelligence, to be able to change its empirical beliefs. (Or its probabilistic model of the world, etc.) If Charles Babbage somehow lived before Nikolai Coperinick, and if computers were somehow invented before telescopes, and somehow programmers the epochs would be constructed by the Artificial Universal Intellect, it does not follow from this that the AI ​​would always believe that the Sun rotates around the Earth. AI can overcome the factual error of its programmers, if programmers understand the theory of inference better than astronomy. To create an AI that will open the orbits of the planets, programmers do not need to know the mathematics of Newtonian mechanics, but only the mathematics of Baise probability theory.

The thoughtlessness of programming AI for the introduction of communism, or any other political system, is that you program the means, not the end. You program certain solutions without the possibility of reworking them after gaining improved empirical knowledge about the results of communism. You give the AI ​​a ready-made solution without teaching him how to re-evaluate, at a higher level of understanding, the initially erroneous process that created this decision.

If I play chess against a stronger player, I cannot predict exactly where my opponent will make a move against me - if I could predict, I would, of necessity, be just as strong in chess myself. But I can predict the final result, namely the gain of another player. I know the area of ​​possible futures where my opponent is heading, which allows me to predict the end of the path, even if I cannot see the road. When I am in the most creative state, this is when it is most difficult to predict my actions and it is easiest to predict the consequences of my actions. (Assuming that you know and understand my goals.) If I want to make a chess player, superior person, I must program the search for winning moves. I should not program specific steps, because in this case a chess player will not be anything better than me. When I start a search, I necessarily sacrifice my ability to predict the exact answer in advance. To get a really good answer, you have to sacrifice your ability to predict the answer, but not your ability to say what the question is.

Such confusion as direct programming of communism probably will not seduce the programmer of the universal AI, who speaks the language of decision theory. I would call it a philosophical mistake, but I would blame the lack of technical knowledge for this.


6.2: An example of a technical failure. (An example of technical failure.)


“Instead of laws restricting the behavior of intelligent machines, we must give them emotions that will guide their learning behavior. They must want us to be happy and prosper, which is the emotion we call love. We can design intelligent machines so that their main, innate emotion will be unconditional love for all people. In the beginning, we can make relatively simple machines that will learn to recognize the expressions of happiness and unhappiness on a human face, human voices and human sign language. Then we can firmly tie the result of this training to the inherent emotional values ​​of more complex intellectual machines, positively supported when we are happy, and negatively - when we are unhappy. Machines can learn approximate prediction algorithms for the future, such as investors are now using learning machines to predict future bond prices. In this way, we can program intelligent machines to learn algorithms for predicting future human happiness, and use these predictions as emotional values. ”

Bill Hibbard (2001), Super-intelligent machines.


One day, the American army wanted to use a neural network to automatically detect camouflaged tanks. Researchers trained a neural network on 50 photos of camouflaged tanks among trees, and on 50 photos of trees without tanks. Using standard techniques of supervised learning, the researchers taught the neural network weighting, which correctly identified the training set — the answer is “yes” - for 50 photos of camouflaged tanks, and the answer is “no” for 50 photos of the forest. It did not guarantee, nor did it mean that new samples would be classified correctly. The neural network could learn a hundred separate cases that could not be generalized to any new problem. Prudent researchers made 200 photos at the beginning, 100 photos of tanks and 100 trees. They used only 50 from each group for the training set. The researchers launched the remaining 100 photos into the neural network, and without further training, the neural network recognized all the remaining photos correctly. Success was confirmed! The researchers sent the completed work to the Pentagon, from where it was soon returned, complaining that in their own series of tests the neural network was no better than the case in the selection of photographs.

It turned out that in the data set of researchers, photos of camouflaged tanks were taken on cloudy days, while photos of clean forest were taken on sunny days. The neural network has learned to distinguish between cloudy and sunny days instead of learning to distinguish camouflaged tanks from empty forests. (footnote 2)

(footnote 2) This story, though well known and often quoted, may be apocryphal. I did not find the message first hand. For a report without links, see Crochat and Franklin (2000) or http://neil.fraser.name/writing/tank/. Errors of this kind are the subject of large realistic considerations when creating and testing neural networks.

A technical failure occurs when the code does not do what you think it does, although it even does what you programmed it to. The same data may correspond to different models. Suppose we teach a neural network to distinguish smiling human faces and to distinguish them from frowning faces. Will this network recognize a small picture of a laughing face as an attractor like a laughing human face? If an AI rigidly fixed on such a code gains power — and Hibbard (2001) speaks of superintelligence — won't the galaxy end up being covered with tiny molecular pictures of smiling faces? (footnote 3)

(footnote 3) Bill Hibbard, after reviewing the draft of this article, wrote a response proving that the analogy with the problem of "tank classifier" is not applicable to reinforcing training in general. His criticism can be found here: http://www.ssec.wisc.edu/~billh/g/AIRisk_Reply.html. My answer: http://yudkowsky.net/AIRisk_Hibbard.html. Hibbard also notes that the Hibbard (2001) proposal has been replaced by the Hibbard (2004) proposal. The latter offers a two-level system in which expressions of consent from people reinforce recognition of happiness, and recognized happiness reinforces behavioral strategies.

This form of failure is especially dangerous because the system looks working in the same context, and fails when the context changes. The creators of the "determinant of tanks" trained their neural network until it began to correctly recognize the data, then checked the network for additional data (without further training). Unfortunately, the data for both training and verification contained an assumption that applied to all the information used in the development, but not to real-world situations where the neural network was designed to work. In the history of the determinant of tanks, this assumption was that tanks were photographed on cloudy days.

Suppose we strive to create an intensifying AI. This AI will have a development phase when programmers will be stronger than it - not only in terms of physical control over AI power, but in the sense that programmers are smarter, more cunning and more creative than this AI. We assume that during the development phase, programmers will have the ability to change the AI ​​source code without its consent. After this point, we must rely on the previously established system of goals, because if the AI ​​works in a rather unpredictable way, then it will be able to actively resist our attempts to correct it - and if the AI ​​is smarter than a person, then most likely he will win.

Attempts to control a growing AI through training a neural network to create its target system are faced with the problem of a large change of context in the transition from the stage of development of AI to the stage after its development (postdevelopmental stage). At the developmental stage, AI can only be able to create reactions falling into the category of “smiling human faces”, solving the tasks presented by people, as its creators conceived. Soon, when AI becomes superhumanly intelligent and creates its own nanotechnology infrastructure, it will be able to create equally attractive incentives for it, covering the entire galaxy with small, smiling faces.

Thus, the AI ​​seems to work properly at the development stage, but creates disastrous results when it gets smarter than programmers (!)

There is a temptation to think: “But surely the AI ​​will know that this is not what we mean?” But the code is not given to the AI, so that it will look at it and return it if it turns out that it is not working properly. The code is AI. Perhaps, with enough effort and understanding, we can write code that makes sure that we don’t write the wrong code — the legendary DWIM instruction, which among programmers means do-that-that-I-mean-do-it-yourself. (Do-What-I-Mean. (Raymond, 2003.)) But it takes effort to describe the mechanics of DWIM, and nowhere in the Hibbard proposal is there any mention of creating an AI that does what we mean rather than what we say. Modern chips do not perform DWIM on their code; this is not an automatic property. And if you have problems with DWIM itself, you will suffer the consequences. Suppose, for example, that DWIM was defined to maximize the programmer's satisfaction with its code; when this code is launched as superintelligence, it can rewrite the brains of the programmer to make him as satisfied as possible with this code. I do not say that it is inevitable; I only say that Do-what-I-mean-it's a big and not trivial technical problem on the way to Friendly AI.


7: The pace of gain intelligence. (Rates of intelligence increase.)


In terms of global risks, one of the most critical circumstances in connection with AI is that AI can increase its intelligence extremely quickly. The obvious reason to suspect this possibility is that recursive self-improvement (Good, 1965) the AI ​​becomes smarter, including smarter with regard to writing the internal cognitive function of the AI, so the AI ​​can rewrite its existing cognitive function to make it work better. This will make the AI ​​even smarter, including smarter with regard to the task of reworking himself, so that he will make even more improvements.

People by and large cannot improve themselves recursively. We improve ourselves to a limited extent: we learn, we train, we sharpen our skills and knowledge. In a small way, these self-improvements improve our ability to improve. New discoveries can increase our ability to make further discoveries - in this sense, knowledge feeds itself. But there is a lower level that we have not even touched. We do not rewrite the human brain. The brain is ultimately the source of discovery, and our brains are now almost the same as they were 10 thousand years ago.

Similarly, natural selection improves organisms, but the process of natural selection does not improve itself - by and large. One adaptation can open the way to additional adaptations. In this sense, adaptation feeds itself. But even when the genetic ocean (pool) is boiling, a downstream heater is still there, namely the processes of recombination, mutation and selection, which do not redesign themselves. Several rare innovations have increased the rate of evolution in itself, for example, the emergence of sexual recombination. But even gender did not change the essential nature of evolution: its absence of abstract intelligence, its dependence on random mutations, its blindness and gradualness, its focus on the frequency of alleles. Similarly, the emergence of science has not changed the essential nature of the human brain: its limbic nucleus, cerebral cortex, its prefrontal self-models, its characteristic speed of 200 HZ.

The AI ​​can rewrite its code from the very beginning — it can change the underlying dynamics of the optimization process. Such an optimization process will wind up much more strongly than evolutionary accumulating adaptations, as well as human accumulating knowledge. The main consequence in terms of our goals is that the AI ​​can make a huge leap in intelligence after reaching a certain threshold of criticality.

The often skeptical view of this scenario - which Good (1965) called the “intellectual explosion” - comes from the fact that progress in the field of AI has a very slow reputation.
Here it is useful to consider a free historical analogy about one unexpected discovery. (Further taken mainly from Rhodes, 1986.)

In 1933, Lord Ernst Rutherford stated that no one should expect that one day he would be able to extract energy from the decay of an atom: "Anyone who was looking for a source of energy in the transformation of atoms said nonsense." In those days, it took days and weeks of work to split a small number of nuclei.

Soon, in 1942, on the tennis court near Stag Field, near the University of Chicago, physicists are building an aggregate in the form of a giant ball-shaped door handle of alternating layers of graphite and uranium, intending to launch the first self-sustaining nuclear reaction. For the project is responsible Enrico Fermi.

The key number for the reactor is K, the effective neutron multiplication factor: that is, the average number of neutrons from the fission reaction, which causes another fission reaction. As long as K is less than one, the reactor is subcritical. When K> = 1, the reactor must maintain a critical reaction. Fermi calculated that the reactor would reach K = 1 with the number of layers between 56 and 57.

The working group led by Herbert Anderson completed the 57th layer on the night of December 1, 1942. Control rods - bars of wood, covered with neutron-absorbing cadmium foil - prevented the reactor from reaching criticality. Anderson removed all the rods except for one and measured the radiation of the reactor, confirming that the reactor was ready for a chain reaction the next day. Anderson put in all the rods, locked them with padlocks, locked the tennis court and went home.

The next day, December 2, 1942, on a windy and frosty Chicago morning, Fermi began the final experiment. All but one of the rods were raised. At 10:37, Fermi ordered the last control rod to be raised to half the height. Geiger counters pounded more often, and the recorder jerked up. “This is not the case,” said Fermi, “the graph will reach this point and will even out,” pointing to a point on the graph. A few minutes later the recorder reached the specified point and did not go higher. A few minutes later, Fermi ordered the rod to be raised another foot. Again the radiation intensified, but then leveled off. The rod was raised by another 6 inches, then another and another.

At 11:30, the slow rise of the recorder was interrupted by a colossal FALL - the protective control rod triggered by an ionization sensor was activated and lowered into the reactor, which was still uncritical. Fermi quietly ordered the team to take a lunch break.

At two o'clock in the afternoon, *** met and gathered again, took out and locked the protective rod, and

продолжение следует...

Продолжение:


Часть 1 Artificial intelligence as a positive and negative global risk factor
Часть 2 9: Threats and prospects. (Threats and promises.) - Artificial intelligence
Часть 3 - Artificial intelligence as a positive and negative global risk


Comments


To leave a comment
If you have any suggestion, idea, thanks or comment, feel free to write. We really value feedback and are glad to hear your opinion.
To reply

Connection with other sciences and cultural phenomena

Terms: Connection with other sciences and cultural phenomena