Computational complexity theory, also called algorithmic complexity theory, seeks to answer

two different kinds of questions: "How hard is this problem?", and "How effective is this

algorithm at solving this problem?". A number of difficult issues are involved here, and it is not possible to delve into them deeply without sophisticated mathematics. Here we shall only scratch the surface.

Questions of computational complexity are only meaningful in the context of a general theory

of computation. Otherwise one can only ask "How hard is this problem for this computer?", or

"How hard is this problem for this particular person?". What lets us ask "How hard is this

problem?", without any reference to who is actually solving the problem, is a theory which tells

us that problems are basically just as hard for one computer as for another. Here as in so many

other cases, it is theory which tells us what questions to ask.

According to the theory of Turing machines, any sufficiently powerful computer can simulate

any other computer. And this is not merely a theoretical illusion. In practice, computers such as PCs, mainframes and supercomputers are highly flexible. An IBM PC could be programmed to

act just like a MacIntosh; in fact, there are software packages which do something very close to

this. Similarly, a MacIntosh could be programmed to act just like an IBM. Turing proved that

there is a program which tells a computer, given appropriate information, how to simulate any

other computer. Therefore, any computer which is powerful enough to run this program can act

as a universal Turing machine. If it is equipped with enough memory capacity — e.g. enough disk

drives — it can impersonate any computer whatsoever.

True, this universal simulation program is very complex. But if a problem is sufficiently

difficult enough, this doesn’t matter. Consider the problem of sorting a list of numbers into

increasing order. Suppose computer A is capable of solving this problem very fast. Then

computer B, if it is sufficiently powerful, can solve the problem by simulating computer A. If the problem is sorting the list {2,1,3}, then this would be a tremendous effort, because simulating A is vastly more difficult than sorting the list {2,1,3}. But if the list in question is a billion numbers long, then it’s a different story. The point is that lists of numbers can get as long as you like, but the complexity of simulating another computer remains the same.

Let us make this example more precise. Assume that both A and B have an unlimited supply

of disk drives — an infinite memory tape — at their disposal. Suppose that the program for

simulating computer A is so slow that it takes computer B 10 time steps to simulate one of

computer A’s time steps. Supposealso that computer A is capable of sorting a list of n numbers in n2 time steps. That is, it can sort 10 numbers in 100 time steps, 100 numbers in 10000 time steps, and so on. Assume that computer B is not quite so bright, and it has a sorting program built into its hardware which takes n3 time steps to sort a list of n numbers.

Then, if B were given a list of 3 numbers, its hardware could sort it in 33=27 time steps. If it

tried to sort it by simulating A, it would take 10(32)=90 time steps. Clearly, it should rely on its built-in hardware. But if B were given a list of 10 numbers, it would take 103=1000 steps to sort it. If it tried to sort the list by simulating A, it would take 10(102) time steps — exactly the same amount of time. And if B were given a list of 1000 numbers, it would take 10003=1,000,000,000 steps to sort it using its hardware, and only 10(10002) =10,000,000 steps to sort it by simulating A. The longer the list is, the more useful is the capacity for simulation, and the less useful is the built-in hardware.

The point is that as the size of the problem, n, gets bigger and bigger, the differences between

computers become irrelevant. It is worth being a little more rigorous about this. Take any type of problem, and assign to each instance of it a "size" n. For example, if the problem is sorting lists of numbers, then each instance is a list of numbers, and its size is its length. Let A(n) denote the longest amount of time which computer A requires to solve any problem instance of size n. Let B(n) denote the longest amount of time which computer B requires to solve any problem instance of size n. Assume that the time required to solve an instance of the problem increases as n increases (just as the time required to sort a list of n numbers increases as n increases). Then it follows that the bigger n gets, the less significant is the difference between A(n) and B(n).

Mathematically, we say that as n goes to infinity, the ratio A(n)/B(n) goes to 1.

All this follows from the assumption that any sufficiently powerful computer can simulate any

other one, by running a certain "universal Turing machine" program of large but fixed size.

AVERAGE-CASE ANALYSIS

Note that the quantity A(n) is defined in terms of "worst-case" computation. It is the longest that computer A takes to solve any problem instance of size n. Any computer worth its salt can sort the list {1,2,3,4,5,6,7,8,9,10} faster than the list {5,7,6,4,10,3,8,9,2,1}. But A(n) ignores the easy cases. Out of all the possible instances, it only asks: how hard is the hardest?

For some applications, this is a useful way to look at computation. But not always. To see

why, consider the following well-known problem. A salesman, driving a jeep, must visit a

number of cities in the desert. There are no mountains, rivers or other obstructions in the region.

He wants to know what is the shortest route that goes through all the different cities. This is

known as the Traveling Salesman Problem. Each specific instance of the problem isparticular

collection of cities or, mathematically speaking, a set of points in the plane. The size of an

instance of the problem, n, is simply the number of cities involved.

How hard is this problem? When the data is presented pictorially, human beings can solve it

pretty well. However, we must remember that even if Maria is exceptionally good at solving the

problem, what Maria(n) measures is the longest it takes Maria to arrive at the correct solution for any collection of n cities. No human being does well according to this strict criterion. We do not always see the absolute shortest path between the n cities; we often identify a route which is close to correct, but not quite there. And we sometimes miss the mark entirely. So we are not very good at solving the Traveling Salesman Problem, in the sense that there are instances of the problem for which we get the answer wrong or take a long time to get to the answer. But we are good at it in the sense that most of the time we get reasonably close to the right answer, pretty fast. There are two different notions of proficiency involved here.

The simplest way to solve the Traveling Salesman problem is to list all the possible paths

between the cities, then compare all the lengths to see which one is the shortest. The problem is that there are just too many paths. For instance, if there are 5 cities, then there are [4x3x2]/2 = 12 paths. If there are 10 cities, then there are [9x8x7x6x5x4x3x2]/2 = 181440 paths. If there are, say, 80 cities, then there are more paths than there are electrons in the universe. Using this method, the number of steps required to solve the Traveling Salesman problem increases very fast as the size of the problem increases. So, given a large Traveling Salesman problem, it might be better to apply erratic human intuition than to use a computer to investigate every possible path.

Let’s consider a simple analogy. Suppose you run a bank, and you have three loan officers

working for you. Officer A is very methodic and meticulous. He investigates every case with the

precision of a master detective, and he never makes a mistake. He never loans anyone more than

they can afford. Everyone he approves pays back their loans, and everyone he turns down for a

loan would not have paid it back anyway. The only problem is that he often takes a long time to

determine his answer. Officer B, on the other hand, works entirely by intuition. He simply looks

a person over, talks to them about golf or music or the weather, and then makes his decision on

the spot. He rejects a some people who deserve loans, and he gives some people more or less

money than they can afford to pay back. He gives loans to a few questionable characters who

have neither the ability nor the inclination to pay the bank back.

Suppose that, although you really need both, you have been ordered to cut back expenses by

firing one of your loan officers. Which one should go? At first you might think "Officer B, of

course." But what if you have a lot ofmoney to lend, and a great many people demanding loans?

Then A might be a poor choice — after all, B will serve a lot more customers each month. Even

though there are some cases where A is much better than B, and there are many cases where A is

a little better than B, the time factor may tip the balance in B’s favor.

You may be thinking "Well, a real bank executive would find someone who’s both fast and

accurate." In the case of the Traveling Salesman problem, however, no one has yet found an

algorithm which finds the exact shortest path every time much faster than the simple method

given above. And it seems likely that no such algorithm will ever be discovered. The Traveling

Salesman problem and hundreds of other important problems have been shown to be "NP-

complete", which means essentially that if there is a reasonably fast algorithm for solving any

one of them, then there is a reasonably fast algorithm for solving all of them. Many

mathematicians believe that the question of whether such algorithms exist is undecidable in the

sense of Godel’s Incompleteness Theorem: that there’s no way to prove that they do, and there’s

no way to prove that they don’t.

Now, we have discovered algorithms which solve the Traveling Salesman problem faster than

people, and on the average come up with better answers (Peters, 1985). But there are still some

collections of cities for which they give the wrong answer, or take a ridiculously long time to

solve. In the case of the Traveling Salesman problem, it seems that there is no point in looking

for algorithms which solve the problem exactly, every time. All the algorithms which do that are

just too slow. Rather, it seems to be more intelligent to look for algorithms that solve the

problem pretty well a lot of the time.

It turns out that most of the mathematical problems involved in thought and perception are a

lot like the Traveling Salesman problem. They are "NP-complete". So when, in later chapters, we

discuss the algorithms of thought, we shall virtually never be discussing algorithms that solve

problems perfectly. The relevant concept is rather the PAC algorithm — the algorithm which is

Probably Approximately Correct.

PARALLELISM

One interesting aspect of the McCullough-Pitts neural network is the way it does many things

at once. At every time step, all the neurons act. The original formulation of the Turing machine

was not like that; it only did one thing at a time. It moved the tapes, then looked in its memory to see what to do next. Of course, the McCullough-Pitts network and the original Turing machine are fundamentally equivalent; anything one can do, so can the other. But the McCullough-Pitts network will, in most cases, get things done faster.

The computers in popular use today are like the original Turing machine: they only do one

thing at a time. This is true of everything from PCs to huge mainframe computers — Cybers,

VAXs and so forth. They are serialcomputers. Some supercomputers and special-purpose

research computers, however, can work in parallel: they can do up to hundreds of thousands of

things at once. The advantage of parallelism is obvious: speed. By using a parallel computer, one trades off space for time.

There are many different kinds of parallel computers. Some are so-called single-instruction

machines. They can do many things at once, as long as these things are all the same. For

instance, a typical single-instruction machine could multiply fifty numbers by four all at the

same time. But it might not be able to multiply one number by four at the same time as it added

six to another number. Multiple-instruction machines are more interesting, but also more

difficult to build and to program. A multiple-instruction parallel computer is like a bunch of

serial computers connected to each other. Each one can execute a different program, and

communicate the results of its computation to certain others. In a way, it is like a society of serial computers. Thinking Machines Corporation, in Cambridge, Massachusetts, has manufactured a

number of powerful multiple-instruction parallel computers called Connection Machines. They

are now being used in science and industry — for, among other things, modeling the behavior of

fluids, analyzing visual data, and generating computer graphics.

Why is all this relevant? Some may dispute the neurophysiological relevance of the

McCullough-Pitts model and its contemporary descendants. But everyone agrees that, if the brain

is a computer, it must be a parallel computer. The brain contains about 100 billion neurons, all

operating at once, and besides that it is continually swirling with chemical activity. The diversity of its activity leaves little doubt that, if it is indeed a computer, it is a multiple-instruction parallel computer. This is the intuition behind the recent spurt of research in parallel distributed processing.

In Chapter 11 I will take this one step further and argue that the brain should be modeled as a

multiple-instruction parallel quantum computer. By then, it will be clear just how different

such a computer is from today’s serial computers. We are talking about a computer which does

billions of different things at once and incorporates a huge amount of chance into its operations.

As we shall see later, it is a computer whose state is not completely measurable by any sequence

of physical observations. It is a computer which, in a physically precise sense, plays a significant role in the continual creation of the universe. It could be argued that a computer with all these properties should not be called a "computer". But, mathematical theories aside, the intuitive concept of computation has always been somewhat fuzzy. As warned in the Introduction, the limitations of present-day computers should not be taken as fundamental restrictions on the nature of computation.

belgesi-932