We have discussed several different measures of static complexity, which measure rather

different things. But all these measures have one thing in common: they work by singling out the

one pattern which minimizes some quantity. It is equally interesting to study the total amount of structure in an entity. For instance, suppose x and x% both have KCS complexity A, but whereas x can only be computed by one program of length A, x% can be computed by a hundred totally different programs of length A. Does it not seem that x% is in some sense more complex than x, that there is more to x% than to x?

Let us define the structure of x as the set of all (y,z) which are approximate patterns in x, and denote it St(x). Then the question is: what is a meaningful way to measure the size of P(x). At first one might think to add up the intensities [1+d(y*z,x)][a%y%+b%z%+cC(y,z)] of all the elements in P(x). But this approach has one crucial flaw, revealed by the following example.

Say x is a sequence of 10,000 characters, and (y1,z1) is a pattern in x with %z1%=70,

%y1%=1000, and C(y1,z1)=2000. Suppose that y1 computes the first 1000 digits of x from the

first 7 digits of z1, according to a certain algorithm A. And suppose it computes the second 1000 digits of x from the next 7 digits of z1, according to the same algorithm A. And so on for the third 1000 digits of z2, etc. — always using the same algorithm A.

Next, consider the pair (y1,z1) which computes the first 9000 digits of x in the same manner as

(y2,z2), but computes the last 1000 digits of x by storing them in z1 and printing them after the rest of its program finishes. We have %z2%=1063, and surely %y2% is not much larger than

%y1%. Let’s say %y2%=150. Furthermore, C(y2,z2) is certainly no greater than C(y1,z1): after all, the change from (y1,z1) to (y2,z2) involved the replacement of serious computation with simple storage and printing.

The point is that both (y1,z1) and (y2,z2) are patterns in x, but in computing the total amount of structure in x, it would be foolish to count both of them. In general, the problem is that different patterns may share similar components, and it is unacceptable to count each of these components several times. In the present example the solution is easy: don’t count (y2,z2). But one may also construct examples of very different patterns which have a significant, sophisticated component in common. Clearly, what is needed is a general method of dealing with similarities between patterns.

Recall that Ia(v%w) was defined as the approximate version of the effort required to compute

v from w, so that if v and w have nothing in common, Ia(v,w)=Ia(v). And, on the other hand, if v

and w have a large commoncomponent, then both Ia(v,w) and Ia(w,v) are very small. Ia(v%w) is

defined only when v and w are sequences. But we shall also need to talk about one program

being similar to another. In order to do this, it suffices to assume some standard "programming

language" L, which assigns to each program y a certain binary sequence L(y). The specifics of L

are irrelevant, so long as it is computable on a Turing machine, and it does not assign the same

sequence to any two different programs.

The introduction of a programming language L permits us to define the complexity of a

program y as Ia(L(y)), and to define the complexity of one program y1 relative to another

program y2, as Ia(L(y1)%L(y2)). As the lengths of the programs involved increase, the differences between programming languages matter less and less. To be precise, let L and L1 be any two programming languages, computable on Turing machines. Then it can be shown that, as L(y1) and L(y2) approach infinity, the ratios Ia(L(y1))/Ia(L1(y1)) and Ia(L(y1)%L(y2))/Ia(L1(y1)%L1(y2)) both approach 1.

Where z is any binary sequence of length n, let D(z) be the binary sequence of length 2n

obtained by replacing each 1 in z with 01, and each 0 in z with 10. Where w and z are any two

binary sequences, let wz denote the sequence obtained by placing the sequence 111 at the end of

D(w), and placing D(z) at the end of this composite sequence. The point, as usual, is that 111

cannot occur in either D(z) or D(w), so that wz is essentially w juxtaposed with z, with 111 as a marker inbetween.

Now, we may define the complexity of a program-data pair (y,z) as Ia(L(y)z), and we may

define the complexity of (y,z) relative to (y1,z1) as Ia(L(y)z%L(y1)z1). And, finally, we may

define the complexity of (y,z) relative to a set of pairs {(y1,z1),(y2,z2),…,(yk,zk)} to be

Ia(L(y)z%L(y1)z1L(y2)z2…L(yk)zk). This is the tool we need to make sense of the phrase "the total amount of structure of x".

Let S be any set of program-data pairs (x,y). Then we may define the size %S% of S as the

result of the following process:

Algorithm 3.1:

Step 0. Make a list of all the patterns in S, and label them (y1,z1), (y2,z2), …, (yN,zN).

Step 1. Let s1(x)=Ia(L(y1)z1).

Step 2. Let s2(x)=s1(x)+Ia(L(y2)z2)I(L(y1)z1).

Step 3. Let s3(x)=s2(x)+Ia(L(y3)z3%L(y1)z1L(y2)z2))

Step 4. Let s4(x)=s3(x)+Ia(L(y4)z4%L(y1)z1L(y2)z2L(y3)z3))

…

Step N. Let %S%=sN(x)=sN-1(x)+

Ia(L(yN)zN%L(y1)z1L(y2)z2)…L(yN-1)zN-1)

At the k’th step, only that portion of (yk,zk) which is independent of {(y1,z1),…,(yk-1,zk-1)} is added onto the current estimate of %S%. For instance, in Step 2, if (y2,z2) is independent of (y1,z1), then this step increases the initial estimate of %S% by the complexity of (y2,z2). But if (y2,z2) is highly dependent on (y1,z1), not much will be added onto the first estimate. It is not difficult to see that this process will arrive at the same answer regardless of the order in which the (yi,zi) appear:

Theorem 3.2: If Ia=I, the result of the above algorithm is invariant under permutation of the

(yi,zi).

Where St(x) is the set of all patterns in x, we may now define the structural complexity of x

to be the quantity %St(x)%. This is, I suggest, the sense of the word "complexity" that one uses

when one says that a person is more complex than a tree, which is more complex than a

bacterium. In a way, structural complexity measures how many insightful statements can

possibly be made about something. There is much more to say about a person than about a tree,

and much more to say about a tree than a bacterium.

ALGEBRA AND TOPOLOGY OF PATTERN SPACE

From the definition of structural complexity we may obtain the extremely useful notion of

structural similarity, which we shall also refer to as pattern-distance. As the name suggests,

this is a measure of how "close" two entities are, structurally speaking. We denote the structural similarity of x and y by d#(x,y), and define it as the structural complexity of the symmetric difference of St(x) and St(y). It measures the amount of pattern in x but not y, or in y but not x.

This concept will play a central role in our treatment of memory and analogy.

The following concept, though it refers only to structure, not structural complexity, is equally

essential. Definition 3.14: The emergence between x and y is defined as

Em(x,y) = St(xUy) – St(y) – St(y)

The intuitive meaning of emergence should be clear: it is what is present in the whole but not

the parts. In the realm of pattern, the whole is in general more than the sum of the parts, in the sense that not all the patterns in the whole are patterns in the parts.

Finally, the following idea, though not so crucial, sheds some light into certain matters.

Definition 3.15: (y1,z1) is said to be complementary to (y2,z2) in x tothefollowing extent:

1 – IN(x,y1,z1)/[IN(L(y1),y2,z2) + IN(z1,y2,z2)]. If y1 is complementary to y2 in x and y2 is

complementary to y1 in x, then y1 and y2 are said to be complementary in x.

Complementarity, intuitively, is a very weak form of negation. If y1 is highly complementary

to y2 in x, that means that although one can effectively represent x in terms of either y1 or y2, once one has represented x in terms of y1, one cannot effectively represent the elements of this representation in terms of y2. If y1 and y2 are both entirely "independent" in St(x), this will usually be the case.

Crude intuitive examples of this phenomenon may be drawn from nearly any field of study.

For instance, in quantum mechanics one may represent an electron as a wave or as a particle, but

once one has represented it as either one cannot interpret one’s representation in terms of the

other one. Or: one may view the American economy as a battleground of class struggle, or as an

arena of gradual evolution by natural selection through competition — but once one has

diagrammed the economy in terms of conflicting classes, one cannot analyze the diagram in

Social Darwinist terms; and vice versa. Of course, these are little more than analogies; in order to make such applications at all precise one would have to provide exact definitions of the terms involved.

COMPUTABLE APPROXIMATIONS

The structural complexity of x is a measure of the total size of the set of all regularities in x.

Unfortunately, as I noted above, it is uncomputable. Step 0 of Algorithm 3.1, in its full

generality, is not programmable. Therefore, in order to apply the notion of structural complexity to real-world problems, it is necessary to restrict oneself to some particular class of patterns. One way to do this is via schematic structural complexity (SSC), a simple notion that will lead us naturally to more interesting complexity measures such as the Lempel-Ziv complexity and the n’th order Boolean complexities.

In the theory of genetic classifier systems (Goldberg, 1989), a schema is a sequence such as

1001**1011101010*1**111, where * is a "don’t care" symbol signifying that either a 0 or a 1

may occupy the indicated place. Let us consider a slightly more general notion of schema, in

which each "don’t care" symbol has a tag (*1,*2, etc.), and each "don’t care" symbol may stand

for a binary sequence of any length. And let us define a schematizer as a function which maps

schema into binary sequences, by inserting in place of each occurence of the "don’t care" symbol

*i some binary sequence wi. Each pair (schematizer, schema) is a schematic program; it defines

a unique binary sequence.

The schematic structural complexity (from here on, SSC) of a binary sequence may be

described in terms of Algorithm 3.1 with a modified initial step.

Algorithm 3.2:

Step 0′. Make a list of all schematic programs that are patterns in x, and label them y1,…,yN

Steps 1-N. As in Algorithm 3.1

The result of this algorithm may be called the schematic size of x, the well-definition of which

follows from the following generalization of Theorem 2.1.

Theorem 3.1. The result of Algorithm 4.1 is invariant with respect to permutation of the yi.

In analogy to structural complexity, the SSC of a sequence x may be defined as the schematic

size of the set of all patterns in x, S(x).

A schematic program represents the simplest sort of compression: it compresses an image or

sequence by abstracting repeated figures. A more flexible approach may be obtained by

considering more general programs. For instance, one might define a metaschematizer as a map

from schema to schema, which takes in a schema and replaces each occurence of the "don’t care"

symbol *i with some schema Si. A metaschematic program is then any program whose action

may be represented in the form schematizer(m1(m2(…(mk(schema))…)), where the mi are

metaschematizers.

Given this, one may define the "metaschematic structural complexity" of a sequence in an

obvious way. This complexity measure recognizes not only repeated figures, but repeated figures

within repeated figures and so on. It is new in detail but not in concept — it is a very close

approximation to the Lempel-Ziv complexity (Lempel and Ziv, 1978).

The Lempel-Ziv complexity misses a great deal of structure. By definition, no computable

approximation to the structural complexity can capture every type of structure. However, there is a large gap between repetitions and completely general patterns. One way to fill this gap is with the n’th order Boolean structural complexity. This complexity measure may be developed by

analogy to the schematic and metaschematic structural complexities.

We will need to use vector Boolean operations, which act coordinatewise on Boolean

sequences. For example the vector negation of 1010110 is 0101001; the vector conjunction of

11010 and 01011 is 11011. And we will say that a Boolean function (of an arbitrary number of

variables) is of order n if it can be written using less than n+1 disjunctions and conjunctions.

An n’th order Boolean schematic program for computing an array x may be defined as a

pair (schema, schematizer), where the schema is a sequence composed of of 0’s, 1’s, unlabeled

"don’t care" symbols *, and k different labeled "don’t care" symbols *1,…,*k. The schematizer

consists of: 1) a memory, which consists of a set of l binary sequences, and 2) a collection of k vector Boolean functions f1,…,fk of order n, each of which takes exactly l sequences as

arguments. The k’th array Boolean function computes the array to be substituted for the "don’t

care" symbol *i.

From the n’th order Boolean schematic programs one may obtain n’th order Boolean

metaschematic programs in a natural way. The n’th order Boolean structural complexity is then

defined in terms of these programs, just as the metaschematic structural complexity is defined in terms of ordinary metaschematic programs. It should be clear that any pattern can be expressed as an n’th order Boolean schematic program for some n. Therefore, the n’th order Boolean programs are one way of forming a bridge between Lempel-Ziv complexity and general

structural complexity.

These approximations are rather technical; they are not as conceptually elegant as the

structural complexity itself. However, it is clear that at least the n’th order Boolean complexity is relevant to mentality, because no brain can compute Boolean functions of arbitrarily high order.

belgesi-940