What does it mean to say that one thing is more complex than another? Like most words,"complexity" has many meanings. In Chapter 1 we briefly discussed the "complexity" ofcomputation -- of problems and algorithms. In this chapter we will consider several approachesto quantifying the complexity of individual entities, beginning with the simple Kolmogorov-Chaitin-Solomonoff definition. Throughout this chapter, when I speak of computers I will mean ordinary Turing machines,not stochastic or quantum computers. As yet, no one really knows how to deal with thecomplexity of objects in the context of stochastic or quantum computation, not in completegenerality. Since a quantum computer can compute only those functions that a Turing machinecan also compute, this limitation is not fatal.It turns out that the easiest way to approach the complexity of objects is via the complexity ofsequences of numbers. In particular, I will concentrate on binary sequences: sequences of zerosand ones. As is common in mathematics, the general issue can be resolved by considering whatat first sight appears to be a very special case. The standard approach to the complexity of binary sequences was invented independently byA.N. Kolmogorov, Gregory Chaitin, and Solomonoff (Chaitin, 1987), so we shall call it the KCScomplexity. In my opinion, what the KCS definition measures is not very well described by theword "complexity." Lack of structure would be a better term. Given any computer A, the KCS complexity of a sequence x is defined to be the length of theshortest self-delimiting program on A which computes x. The restriction to "self-delimiting"programs is necessary for technical purposes and will not worry us much here; roughly speaking,a self-delimiting program is one which contains a segment telling the computer which runs ithow long it is. In the following, I may occasionally refer to "shortest programs" instead of"shortest self-delimiting programs"; but it should be implicitly understood thatall programsdiscussed are self-delimiting. For instance, the KCS complexity of the sequence 10011010010010010 on an IBM PC is thelength of the shortest program which, when loaded into the PC, causes it to output10011010010010010 on the screen. In what follows, I will occasionally refer to the KCScomplexity of a sequence x as KCS(x). There is some vagueness here, as to what "length" means. For one thing, there are largedifferences between the various programming languages on the market today. There are anumber of "high-level" languages, which allow one to type in programs vaguely resemblingmathematical formulae: Fortran, Pascal, Cobol, Snobol, Ada, Prolog, C, Basic, Lisp, Forth, andso on. A program which is short in Pascal may be long in Cobol; and a program which is short inBasic may be long in Pascal. And then there is "assembly language", which refers directly to thehardware of the computer. A program in assembly language is usually very long. However,before a computer can use a program written in a high-level language, it must translate it intoassembly language. (The program which does the translation is called the "compiler"). Whenfiguring the length of a program written in Fortran, should we use the number of characters in theprogram as originally typed in, or the number of characters in the assembly-language translationof the program? From the point of view of the mathematical theory of complexity, none of these issues matter.We can simply assume we are dealing with a universal Turing machine. Translating from aforeign language is essentially the same as simulating another computer. So if a sequence is long enough, its KCS complexity is essentially language-independent and computer-independent. For example, say you have a sequence x consisting of a billion 0s and 1s. Suppose it can becomputed by a program of length 1,000,000 on an IBM PC. Suppose a VAX computer has beenprogrammed to simulate a PC, and suppose this simulation program has length 200,000. Thenthe shortest program for computing x on the VAX cannot be any longer than 1,200,000. Because,if all else fails, one can compute x on the VAX by simulating a PC. These numbers are highlyunrealistic, but the point is that as the sequences get longer and longer, the size of the simulation program remains the same. When the sequences are a trillion digits long, the 200,000 length of the simulation program will mean next to nothing. Some of the newer programming languages are actually "universal programming languages"in a very practical sense: any contemporary programming language can be compactly written inthem. For instance, one could write a C program which simulated Pascal or Lisp or Fortran, or aLisp program that simulated Pascal or Fortran or C. (In fact, what is now known as Lisp wasoriginally written in a simpler form of Lisp, and what is now known as C was originally writtenin a simpler form of C.) Let's say a certain sequence x could be computed by a very short Fortran program. Then one way to compute the sequence on a machine with a built-in Lisp compilerwould be to write a Lisp program to simulate Fortran. If there were no simpler way to computethe sequence in Lisp, this might yield the shortest program forcomputing the sequence on thatparticular machine.Again: the beauty of theoretical computer science is that, as long as we are talking about long sequences, we don't have to worry about the properties of specific machines or languages. This is what differentiates theoretical computer science from practical computer science. Naturally, the latter is more visible in the everyday world. However, both the theory and the practice ofcomputer science are essential to the study of mind. What is the KCS complexity of the sequence 0101010101010101010101010101010101010101? It should be very small on any computer, because one can write aprogram saying "Print '01' twenty times". And what is the complexity of the sequence consistingof 01 repeated 1,000,000,000,000 times? This should still be very small on any computer,because one can write a program saying "Print '01' 1,000,000,000,000 times." Note that this program is not quite as short as "Print 01 20 times". Our program for repeating'01' 1,000,000,000,000 times is 31 characters long, but our program for repeating '01' 20 times is only 16 characters long. The difference is, obviously, that it takes more space to write'1,000,000,000,000' than it does to write '20'. As n increases, the KCS complexity of repeatingsomething over and over again n times increases. But it does not increase very fast. After all,1,000,000,000,000 is 50,000,000,000 times as large as 20. But, according to the programs writtenabove, the KCS complexity of repeating '01' 1,000,000,000,000 times is only 31/16 times theKCS complexity of repeating '01' 20 times. The ratio of program sizes, here 31/16, may vary from computer to computer, fromprogramming language to programming language. It would be different if this book were writtenin Spanish rather than English, because the equivalent of the word "print" would not have exactly five letters in it. But it is difficult to imagine a computer on which it would approach50,000,000,000. Mathematically, we may say that as n gets bigger and bigger, the size of theshortest program for repeating something n times gets closer and closer to log(n). This is little more than common sense, because the number of digits in the decimal expansion of a largenumber n is always very close to log(n). What about the KCS complexity of the sequence 01001100011100001111000001111100000011111100000001111111? This depends onwhether it is shorter to say Print 0100110001110000111100000111110000001111110000000 1111111or to say Do the following for k=1, then k=2, and so on up to k=7: Print k '0's and then k '1's"In this case the former is a bit shorter. But consider the sequence010011000111000011110000011111000000111111000000011111110000000011111111000000000111111111000000000011111111110000000000011111111111000000000000111111111111000000000000011111111111110000000000000011111111111111? Here there is no doubt that the latter sort of program isshorter. Actually determining the KCS complexity of a sequence is a difficult matter. There aresequences which look completely random and can nonetheless be computed by short programs.For instance, if one printed the first ten thousand digits of the binary expansion of pi, virtually no human being would recognize any structure in it. On the other hand, what is the complexity of a sequence x which is completely random in thesense of having no structure whatsoever? In this case the best way to compute x is to write aprogram saying "Print x". This program is about as long as x is. If x has n digits, this programhas length n+c, for some small constant c. In the case of the program as written above, c=5.According to the KCS definition, a completely structureless sequence such as10010100101001000011101001101001010010110001100101010001110110101010001001010010100100101001010110101 is the most complex kind of sequence, with a complexity approximately equal to n. On the other hand, a sequence with a very simple structure, such as 1111111111111111111111, is the least complex kind of sequence, with a complexity approximately equal to log(n). Sequences with more intricate structures fall somewhere inbetween. It can be shown that no program can compute the KCS complexity of an arbitrary sequence.For any program P, there is some X the KCS complexity of which P cannot compute. belgesi-936