Charles S. Peirce, the turn-of-the-century American philosopher, liked to talk about the "one
law of mind." He gave this law many different formulations, the most suggestive of which was
only five words: "the tendency to take habits". This simple, potent idea is at the heart of the
theory of mind to be presented in the following chapters. But instead, of "habits", I prefer to
speak of "patterns". And rather than taking "habit" or "pattern" as a primitive undefined term, I will begin by providing a sketchy but completely rigorous mathematical theory of pattern.
As I understand it, the concept of pattern relies on two simpler ideas: combination and
complexity. More precisely, in order to talk about certain entities being patterns in other entities, we must have
1) some way of combining certain pairs of entities y and z to obtain a third entity called y*z
2) some way of computing, for every entity x, a nonnegative real number %x% called the
complexity of x Any set of entities which fulfills these requirements may be called a pattern space. Formally, we may say:
Definition 3.1: A pattern space is a set (S,*,% %), where S is a set, * is a binary operation
defined on some subset of SxS, and % % is a map from S into the nonnegative real numbers.
Let’s consider a simple example: Turing machines and finite binary sequences. If y is a Turing
machine and z is a finite binary sequence, then there is a natural way of combining x and y — just put y on the input tape of the Turing machine, extending to the right of the tape head. In this case, we can define x*y to be the binary sequence which appears on the tape of the Turing
machine y after, having been started with z on its tape, its program finishes running. It is true that there is no guarantee the Turing machine will ever stop running. But if it doesn’t, we can simply consider x*y to be undefined, and leave it at that.
As for requirement number 2, we can define the complexity %z% of a finite binary sequence z
as its length. And, roughly speaking, we can define the complexity %y% of a Turing machine y
as the length of its program. More precisely, %y% might be defined as length of the code
number which, when fed into a certain universal Turing machine, enables that Turing machine to
act exactly like machine y in every situation. Here %y% and %z% are nonnegative numbers, so
the set of Turing machines and finite binary sequences is a pattern space.
Since we shall be returning to this example again and again, it is worth formulating a specific
notation. Let %z%T denote the length of a finite binary sequence z, and let %y%T denote the
length of the program y.
Now we are prepared to ask: what is a pattern?
First of all, a pattern is a pattern in something, in some entity x. Secondly, a pattern is an
ordered pair of entities, denoted (y,z). And finally, we have what I shall call the fundamental
Definition 3.2: Let a, b, and c denote constant, nonnegative numbers.
Then an ordered pair (y,z) is a pattern in x if x=y*z and
a%y% + b%z% + cC(y,z) < %x%.
The only unfamiliar term here is C(y,z). This denotes the complexity of obtaining x from (y,z).
If y is a Turing machine program and z is a finite binary sequence, we shall let CT(y,z) denote the number of time steps which the Turing machine takes to stop when equipped with program y and
given z as initial input.
For many purposes, the numbers a, b and c are not important. Often they can all be taken to
equal 1, and then they don’t appear in the formula at all. But in some cases it may be useful to set a=b=1 and c=0, for instance. Then the formula reads %y% + %z% < %x%. The constants lend
the formula an element of flexibility.
Intuitively, an ordered pair (y,z) is a pattern in x if the complexity of y, plus the complexity of z, plus the complexity of getting x out of y and z, is less than the complexity of x. In other words, an ordered pair (y,z) is a pattern in x if it is simpler to represent x in terms of y and z than it is to say "x". The constants a, b and c just weight things: if a=3/4 and b=5/4, for example, then the complexity of y counts less than the complexity of z.
The definition of pattern can be generalized to ordered n-tuples, and to take into account the possibility of different kinds of combination, say *1 and *2.
Definition 3.3: An ordered set of n entities (x1,x2,…,xn) is a pattern in x if x=x1*1×2*2…*n-1xn and a1%x1%+a2%x2%+…+an%xn%+ an+1C(x1,…,xn) < %x%, where C(x1,…,xn) is the complexity of computing x1*1×2…*n-1xn and a1,…,an+1 are nonnegative numbers.
Also, the following concept will be of use:
Definition 3.4: The intensity in x of a ordered pair (y,z) such
that y*z=x may be defined as
IN[(y,z)%x] = ( %x% – [a%y% + b%z% + cC(y,z)] )/%x%.
Obviously, this quantity is positive whenever (y,z) is a pattern in x, and negative or zero
whenever it is not; and its maximum value is 1.
AN EXAMPLE: GEOMETRIC PATTERN
Most of our discussion will be devoted to Turing machines and binary sequences. However,
the definition of pattern does not involve the theory of computation. Essentially, a pattern is a "representation as something simpler"; and simplicity need not necessarily be defined in terms of computation. Instead of Turing machines and binary sequences let us now consider pictures.
Suppose that A is a one inch square black-and-white picture, and B is a five inch square picture
made up of twenty-five non-overlapping copies of A. Intuitively, it is simpler to represent B as
an arrangement of copies of A, than it is to simply consider B as a "thing in itself". Very roughly speaking, it would seem likely that part of the process of remembering what B looks like consists of representing B as an arrangement of copies of A.
This intuition may be expressed in terms of the definition of pattern. Where x and y are square regions, let:
y*1z denote the region obtained by placing y to the right of z
y*2z denote the region obtained by placing y to the left of z
y*3z denote the region obtained by placing y below z
y*4z denote the region obtained by placing y above z.
And, although this is obviously a very crude measure, let us define the complexity %x% of a
square region with a black-and-white picture drawn in it as the proportion of the region covered
with black. Also, let us assume that two pictures are identical if one can be obtained by a rigid motion of the other.
The operations *1, *2, *3 and *4 may be called simple operations. Compound operations are,
then, compositions of simple operations, such as the operation (x*1w*2x)*4w. If y is a compound
operation, let us define its complexity %y% to be the length of the shortest program which
computes the actual statement of the compound operation. For instance, %(x*1w*2x)*4w% is
defined to be the length of the shortest program which outputs the sequence of symbols
Where y is a simple operation and z is a square region, let y*z denote the region that results
from applying y to z. A compound operation acts on a number of square regions. For instance,
(x*1w*2x)*4w acts on w and x both. We may consider it to act on the ordered pair (x,w). In
general, we may consider a compound operation y to act on an ordered set of square regions
(x1,x2,…,xn), where x1 is the letter that occurs first in the statement of y, x2 is the letter that occurs second, etc. And we may define y*(x1,…,xn) to be theregion that results from applying the compound operation y to the ordered set of regions (x1,…,xn).
Let us return to the two pictures, A and B, discussed above. Let q=A*1A*1A*1A*1A. Then, it
is easy to see that B=q*4q*4q*4q*4q. In other words, B = (A*1A*1A*1A*1A)*4(A*1A*1A*1A*1A)*4(A*1A*1A*1A*1A)*4
(A*1A*1A*1A*1A)*4(A*1A*1A*1A*1A). Where y is the compound operation given in the
previous sentence, we have B=y*A.
The complexity of that compound operation, %y%, is certainly very close to the length of the
program "Let q=A*1A*1A*1A*1A; print q*4q*4q*4q*4q". Note that this program is shorter than
the program "Print (A*1A*1A*1A*1A)*4(A*1A*1A*1A*1A)*4 (A*1A*1A*1A*1A)*
(A*1A*1A*1A*1A)*4(A*1A*1A*1A*1A)", so it is clear that the latter should not be used in the
computation of %y%.
We have not yet discussed the term C(y,(B1,…,Bn)), which represents the amount of effort
required to execute the compound operation y on the regions (x1,…,xn). For simplicity’s sake, let us simply set it equal to the number of times the symbol "*" appears in the statement of y; that is, to the number of simple operations involved in y.
So, is (y,A) a pattern in B? Let us assume that the constants a, b and c are all equal to 1. We
know y*A=B; the question is whether %y%+%A%+C(y,A) < %B%.
According to the above definitions, %y% is about 37 symbols long. Obviously this is a matter
of the particular notation being used. For instance, it would be less if only one character were
used to denote *1, and it would be more if it were written in binary code.
And C(y,z) is even easier to compute: there are 24 simple operations involved in the
construction of B from A.
So we have, very roughly speaking, 37 + %z% + 24 < %x%. This is the inequality that must
be satisfied if (y,z) is to be considered a pattern in x. Rearranging, we find: %z% < %x% – 61.
Recall that we defined the complexity of a region as the proportion of black which it contains.
This means that (y,z) is a pattern in x if and only if it the amount of black required to draw B
exceeds amount of black required to draw A by more than 61. Obviously, whether or not this is
the case depends on the units of measurement.
This is a very simple example, in that the compound operation y involves only one region. In
general, we may define %(x1,…,xn)%=%x1%+…+%xn%, assuming that the amount of black in a
union of disjoint regions is the sum of the amounts of black in the individual regions. From this it follows that (y,(x1,…,xn)) is a pattern in x if and only if a%y% + b(%x1%+…+%xn%) +
cC(y,(x1,…,xn)) < %x%. Results similar to these could also be obtained from a different sort of analysis. In order to deal with regions other than squares, it is desirable to replace *1, *2, *3, *4 with a single "joining" operation *, namely the set-theoretic union %. Let z=(x1,…,xn), let y be a Turing machine, let f be a method for converting a picture into a binary sequence, let g be a method for converting abinary sequence into a picture. Then we have
Definition 3.5: If x = x1 % x2 %…% xn, then (y,z,f,g) is a pattern in x if
a%y%+b%z%+c%f%+d%g%+eC(y,z,f,g) < %x%. We have not said how %f% and %g% are to be defined. This would require a detailed consideration of the geometric space containing x, which would take us too far afield. This general approach is somewhat similar to that taken in Chaitin (1978).
ORDERS OF COMPLEXITY
It should be apparent from the foregoing that complexity and pattern are deeply interrelated. In
this and the following sections, we shall explore several different approaches to measuring
complexity, all of which seek to go beyond the simplistic KCS approach. Remember, according
to the KCS approach, complexity means structurelessness. The most "random", least structured
sequences are the most complex. The formulation of this approach was a great step forward. But
the next step is to give formulas which capture more of the intuitive meaning of the word
First, we shall consider the idea that pattern itself may be used to define complexity. Recall the geometric example of the previous section, in which the complexity of a black-and-white picture in a square region was defined as the amount of black required to draw it. This measure did not even presume to gauge the effort required to represent a black-and-white picture in a square region. One way to measure the effort required to represent such a picture, call it x, is to look at all compound operations y, and all sets of square black-and-white pictures (x1,…,xn), such that y*(x1,…,xn)=x. One may then ask which y and (x1,…,xn) give the smallest value of a%y% + b(%x1% + … + %xn%) + c(y,(x1,…,xn)). This minimal value of a%y% + b(%x1%+…+%xn%) may be defined to be the "second-order" complexity of x. The second-order complexity is then be a measure of how simply x can be represented — in terms of compound operations on square regions.
In general, given any complexity measure % %, we may use this sort of reasoning to define a
complexity measure % %%.
Definition 3.6: If % % is a complexity measure, % %% is the complexity measure defined so that %x%% is the smallest value that the quantity a%y% + b%z% + cC(y,z) takes on, for any (y,z) such that y*z=x.
%x%% measures how complex the simplest representation of x is, where complexity is
measured by % %. Sometimes, as in our geometric example, % % and % %% will measure very
different things. But it is not impossible for them to be identical.
Extending this process, one can derive from % %% a measure % %%: the smallest value that
the quantity a%y%% + b%z%% + cC(y,z) takes on, for any (y,z) such that y*z=x. %x%% measures the complexity of the simplest representation of x, where complexity is measured by % %%. It might be called second-order complexity. And from % %%, one may obtain a measure % %%, third-order complexity. It is clear that this process may be continued indefinitely.
It is interesting to ask when % % and % %% are equivalent, or almost equivalent. For
instance, assume that y is a Turing machine, and x and z are binary sequences. If, in the notation given above, we let % %=% %T, then %x%% is a natural measure of the complexity of a
sequence x. In fact, if a=b=1 and c=0, it is exactly the KCS complexity of x. Without specifying
a, b and c, let us nonetheless use Chaitin’s notation for this complexity: I(x).
Also, let us adopt Chaitin’s notation I(v%w) for the complexity of v relative to w.
Definition 3.7: Let y be a Turing machine program, v and w binary sequences; then I(v%w) denotes the smallest value the quantity a%y%T+cCT(y,w) takes on for any self-delimiting program y that computes v when its input consists of w. Intuitively, this measures how hard it is to compute v given complete knowledge of w.
Finally, it should be noted that % % and % %% are not always substantially different:
Theorem 3.1: If %x%%=I(x), a=b=1, and c=0, then there is some K so that for all x %
%x%% – %x%%% < K.
Proof: a%y%% + b%z%% + cC(y,z) = %y%% + %z%%. So, what is the smallest value that %y%% + %z%% assumes for any (y,z) such that y*z=x? Clearly, this smallest value must be either equal to %x%%, or very close to it. For, what if %y%% + %z%% is bigger than %x%%? Then it cannot be the smallest %y%% + %z%%, because if one took z to be the "empty sequence" (the sequence consisting of no characters) and then took y to be the shortest program for computing x, one would have %z%%=0 and %y%%=%x%%. And, on the other hand, is it possible for %y%%+%z%% to be smaller than %x%%? If %y%%+%z%% were smaller than x, then one could program a Turing machine with a program saying "Plug the sequence z into the program y," and the length of this program would be less than %x%%, or at least greater than %x%% by no more than the length of the program P(y,z)="Plug the sequence zinto the program y". This length is the constant K in the theorem.
Corollary 3.1: For a Turing machine for which the program P(y,z) mentioned in the proof is a "hardware function" which takes only one unit of length to program, % %%=% %%. Proof: Both % %% and % %% are integer valued, and by the theorem, for any x, %x%% % %x%% % %x%%+1.
PATTERNS IN PATTERNS; SUBSTITUTION MACHINES
We have discussed pattern in sequences, and patterns in pictures. It is also possible to analyze
patterns in other patterns. This is interesting for many reasons, one being that when dealing with machines more restricted than Turing machines, it may often be the case that the only way to express an intuitively simple phenomenon is as a pattern in another pattern. This situation will arise in our analysis of the perceptual hierarchy, several chapters down the road.
Let us consider a simple example. Suppose that we are not dealing with Turing machines, but
rather with "substitution machines" — machines which are capable of running only programs of
the form P(A,B,C)="Wherever sequence B occurs in sequence C, replace it with sequence A".
Instead of writing P(A,B,C) each time, we shall denote such a program with the symbol (A,B,C).
For instance, (1,10001,1000110001100011000110001) = 11111. (A,B,C) should be read
"substitute A for B in C".
We may define the complexity %x% of a sequence x as the length of the sequence, i.e.
%x%=%x%T, and the complexity %y% of a substitution program y as the number of symbols
required to express y in the form (A,B,C). Then, %1000110001100011000110001%=25,
%11111%= 5, and %(10001,1,z)%=11. If z=11111, (10001,1,z)= 1000110001100011000110001. For example, is (10001,1,z), 11111) a pattern in 1000110001100011000110001? What is required is that
a(11) + b(5) + cC((10001,1,z),11111) < 25.
If we take a=b=1 and c=0 (thus ignoring computational complexity), this reduces to
11 + 5 < 25.
This is true, so it is indeed a pattern.
If we take c=1 instead of zero, and leave a and b equal to one, then it will still be a pattern, as long as the computational complexity of obtaining 1000110001100011000110001 from
(10001,1,11111) does not exceed 9. It would seem most intuitive to assume that this
computational complexity C((10001,1,z),11111) is equal to 5, since there are 5 1’s into which
10001 must be substituted, and there is no effort involved in locating these 1’s. In that case the fundamental inequality reads 11 + 5 + 5 < 25,
which verifies that a pattern is indeed present.
Now, let us look at the sequence x = 1001001001001001000111001
Remember, we are not dealing with general Turing machines, we are only dealing with
substitution machines, and the only thing a substitution machine can do is plug one sequence in
for another. Anythingwhich cannot be represented in the form (A,B,C), in the notation given
above, is not a substitution machine.
There are two obvious ways to compute this sequence x on a substitution machine. First of all,
one can let y=(100100100100100100,B,z), and z= B 0111001B1011101110B110111B. This
amounts to recognizing that 100100100100100100 is repeated in x. Alternatively, one can let
y%=(100,B,z%), and let z%= BBBBBB0111001BBBBBB1011101110BBBBBB 110111BBBBBB. This amounts to recognizing that 100 is a pattern in x. Let us assume that a=b=1, and c=0. Then in the first case %y% + %z% = 24 + 27 = 51; and in the second case %y%% + %z%% = 9 + 47 = 56. Since %x% = 95, both (y,z) and (y%,z%) are patterns in x.
The problem is that, since we are only using substitution machines, there is no way to combine
the two patterns. One may say that 100100100100100100 a pattern in x, that 100 is a pattern in
x, that 100 is a pattern in 100100100100100100. But, using only substitution machines, there is
no way to say that the simplest way to look at x is as "a form involving repetition of
100100100100100100, which is itself a repetition of 100".
Let us first consider %x%%. It is not hard to see that, of all (y,z) such that y is a substitution machine and z is a sequence, the minimum of %y% + %z% is obtained when
y=(100100100100100100,B,z), and z= B 0111001 B 1011101110 B 110111 B. Thus, assuming
as we have that a=b=1 and c=0, %x%%=51. This is much less than %x%, which equals 95.
Now, let us consider this optimal y. It contains the sequence 100100100100100100. If we
ignore the fact that y denotes a substitution machine, and simply consider the sequence of
characters "(100100100100100100,B,z)", we can search for patterns in this sequence, just as we
would in any other sequence. For instance, if we let y1=(100,C,z1), and z1=CCCCCC, then
y1*z1=y, %y1%=10, and %z1%=6. It is apparent that (y1,z1) is a pattern in y, since %y1% + %z1%
= 10 + 6 = 16, whereas %y% = 18. By recognizing the pattern (y,z) in x, and then recognizing
the pattern (y1,z1) in y, one may express both the repetition of 100100100100100100 in x and the
repetition of 100 in 100100100100100100 as patterns in x, using only substitution machines.
Is (y1,z1) a pattern in x? Strictly speaking, it is not. But we might call it a second-level pattern in x. It is a pattern in a pattern in x. And, if there were a pattern (y2,z2) in the sequences of symbols representing y1 or z1, we could call that a third-level pattern in x, etc.
In general, we may make the following definition:
Definition 3.8: Let F be a map from SxS into S. Where a first-
level pattern in x is simply a pattern in x, and n is an integer greater than one, we shall say that P is an n’th-level pattern in x if there is some Q so that P is an n-1’th-level pattern in x and P is a pattern in F(Q).
In the examples we have given, the map F has been the implicit map fromsubstitution
machines into their expression in (A,B,C) notation.
Suppose that y1*z1=x, whereas y2*z2 does not equal x, but is still very close to x. Say
%x%=1000. Then, even if %y1%+%z1%=900 and %y2%+%z2%=10, (y2,z2) is not a pattern in x,
but (y1,z1) is. This is not a flaw in the definition of pattern — after all, computing something near x is not the same as computing x. Indeed, it might seem that if (y2,z2) were really so close to computing x, it could be modified into a pattern in x without sacrificing much simplicity.
However, the extent to which this is the case is unknown. In order to incorporate pairs like
(y2,z2), we shall introduce the notion of approximate pattern.
In order to deal with approximate pattern, we must assume that it is meaningful to talk about
the distance d(x,y) between two elements of S. Let (y,z) be any ordered pair for which y*z is
defined. Then we have Definition 3.9: The ordered pair (y,z) is an approximate pattern
in x if [ 1 + d(x,y*z) ][ a%y% + b%z% + cC(y,z) ] < %x%, where a, b, c and C are defined as
in the ordinary definition of pattern.
Obviously, when x=y*z, the distance d(x,y*z) between x and y*z is equal to zero, and the
definition of approximate pattern reduces to the normal definition. And the larger d(x,y*z) gets, the smaller a%y%+b%z%+cC(y,z) must be in order for (y,z) to qualify as a pattern in x.
Of course, if the distance measure d is defined so that d(a,b) is infinite whenever a and b are
not the same, then an approximate pattern is an exact pattern. This means that when one speaks
of "approximate pattern", one is also speaking of ordinary, exact pattern.
Most concepts involving ordinary or "strict" pattern may be generalized to the case of
approximate pattern. For instance, we have:
Definition 3.10: The intensity of an approximate pattern (y,z) in x is IN[(y,z)%x] = (%x%- [1+d(x,y*z)][a%y%+b%z%+cC(y,z)])/%x%.
Definition 3.11: Where v and w are binary sequences, the approximate complexity of v relative to w, Ia(v,w), is the smallest value that [1+d(v,y*w)][a%y%+cC(y,w)] takes on for any program y with input w.
The incorporation of inexactitude permits the definition of pattern to encompass all sorts of
interesting practical problems. For example, suppose x is a curve in the plane or some other
space, z is a set of points in that space, and y is some interpolation formula which assigns to each set of points a curve passing through those points. Then Ia[(y,z)%x] is an indicator of how much use it is to approximate the curve x by applying the interpolation formula y to theset of points z.