The American philosopher Charles S. Peirce founded a vast philosophical system on the
principle of "the tendency to take habits." By this he meant, roughly, the following:
Peirce’s Principle: Unless restrained by the extension of another habit, a habit will tend to
Here "extend" is taken to refer to both space and time, and "habit" is essentially synonymous
with "pattern." In a way this is a distant relative of Newton’s First Law of Motion: unless
somehow obstructed, an object will travel in a straight line.
Peirce never seriously sought a proof of this principle; he considered it primary and
irreducible. He did, however, provide an amusing cosmogonic argument, which begins with the
assumption that, out of the primordial void, various habits emerged at random, with extremely
small intensities. Once the tendency to take habits comes into existence to an indefinitely small degree, he argued, then by the tendency to take habits the particular habit which is the tendency to take habits is extended — et cetera. Hence the tendency to take habits is strengthened, and hence more and more habits which emerge to a miniscule degree will be extended, and this constitutes an extension of the tendency to take habits, and hence the tendency to take habits will be further extended — et cetera.
Clearly, this sort of reasoning — though intriguing — contains all sorts of hidden assumptions; and there is not much point in debating whether or not itis really a "justification" of Peirce’s principle. It is best to take the simpler path and assume Peirce’s principle outright.
I will assume, first of all, that "a pattern tends to extend itself through time". This does not
imply that all patterns continue indefinitely through time; this is obviously absurd. Merely a
tendency is posited: merely that, given that a pattern X has occurred in the past, the probability that X occurs in the future is greater than it would be if the events at each time were determined with no reference whatsoever to previous events. Roughly speaking, this is equivalent to the hypothesis that the environment is self-organizing.
Clearly, this assumption requires that the world, considered as a dynamical system, is not
highly S.S.-sensitive. If the world were highly S.S.-sensitive, then one would need essentially
complete knowledge regarding the structure of the past world in order to predict the structure of the future world. But if the world possessed the tendency to take habits, then there would be a good chance the patterns one recognized would be continued to the future, thus limiting the
possible degree of S.S.-sensitivity. This relationship can be spelled out in a precise inequality; but there seems little point. The basic idea should be clear. Conversely, one might wonder: does low S.S.-sensitivity inherently imply low tendency to take habits? It seems not. After all, low S.S.-sensitivity implies that it is possible to compute future structure from past structure, not that future structure is similar to past structure.
It may be worth phrasing this distinction more precisely. The structure of a dynamical system
over an immediately past interval of time (the "past structure") does not, in general, determine
the structure of the system over an immediately future interval of time (the "future structure").
But the past structure does place certain constraints upon the future structure; it enforces a
certain probability distribution on the set of all future structures. That is to say, future structure is dependent upon past structure according to some stochastic function F. Then, all low S.S.- sensitivity says is that, if X and Y are close, F(X) and F(Y) are reasonably close (here X and Y are structures, i.e. sets of patterns). But what the tendency to take habits says is that X and F(X) are reasonably close. From this it is apparent that the tendency to take habits implies low S.S.- sensitivity, but not vice-versa.
Let us return to our previous example. In the case of 0101010101010…, the tendency to take
habits translates the assumption that the next term will be a 0 into the assumption that the pattern x=y*z will be continued, where x is the sequence involved, y is the function f(A)=AAA…A which juxtaposes A n times, * is function evaluation, and z=01. Clearly %y%, %z% and C(y,z) are rather small, so that this will be a pattern in reasonably short sequences of the form 0101010101010…. It is also important to note that, in the case 0101010010101010…, the most natural application of the tendency to take habits involves the same y and z as above, but in this case as an approximate pattern. One might consider d(y*z,x)=1, since y*z may be changed into x by one insertion.
I have not yet been any more specific than Peirce as to what "the tendencyto take habits"
actually is. Clearly, if one saw the sequence 0101010101010… in reality, one might or might not assume that, by induction, the next term was a 1. It would depend on context. For instance, if one were receiving such numbers as output from a computer program, and the last twelve outputs one had received were all either 0101010101000 or 1010101010111, then having seen 01010101010 one would probably assume the next term was going to be a 0. Obviously, this is also induction; one is merely inducing relative to a larger data set. What the tendency to take habits, properly formulated, should tell us is that given no other relevant knowledge, one should assume a given pattern will continue, because there is a certain tendency for patterns to continue, and if one does not assume this there is nothing to do but assume a uniform distribution on the set of possible outcomes. When two patterns are in competition — when they cannot both continue — then one must make a decision as to which one to assume will continue.
This example might be taken to suggest that the pattern based on the largest data set is always
the best bet. But this is not the case. For what if one’s computer outputs the following five
sequences: 9834750940, 2345848530, 0000000000, 9875473890, 1010101010. Then when one
sees for the sixth output 010101010, what is one to assume will be the last term? Does one
follow the pattern extending over all five of the prior inputs, that all sequences end in 0? Or is one to obey the internal logic of the sequence, bolstered by the fact that the fifth sequence, with a very similar structure, and the third sequence, with a fairly similar structure, were each continued in a way which obeyed their internal logic? According to my intuition, this is a borderline case.
One could concoct a similar example in which the clear best choice is to obey the structure of
the individual sequence. Indeed, if one simply replaced the final digit of the first sequence with a 1, then ending in 0 might still be an approximate pattern, but according to my intuition the best guess for the next term of the sixth sequence would definitely be 1.
If nothing else, the examples of the previous paragraph demonstrate that the choice of pattern
is a matter of intuition. The tendency to take habits is a powerful, but it doesn’t tell you what to assume when experience or logic tells you that two patterns, both historically prominent, cannot both occur. And the case of two contradictory patterns is a relatively simple one: in reality, every mind has recognized a huge set of patterns, each one contradicting numerous others.
In order to resolve this dilemma, I will propose a strengthening of Peirce’s formulation of the
tendency to take habits. I suggest that, when it possesses little or no information indicating the contrary, an intelligence should assume that the most intense of a group of contradictory patterns will continue. This strategy can only work, of course, if the universe operates according to the following special principle:
Strengthened Peirce’s Principle: A pattern tends to continue, and the more intense a pattern it is, the more likely it is to continue.
This principle does not say how much more likely. But in general, a mind may want to
consider the probability of a given pattern occurring in the future. In that case, it would need to know the exact nature of the relation between intensity and chance of continuance. One might think that this relation could be determined by induction, but this would lead to circular
reasoning, for the execution of this induction would require some assumption as to the nature of
this induction. At some level one must make an a priori assumption.
In sum: I have not attempted to "justify" induction; I have rather placed a condition on the
universe under which a certain form of induction will generally be effective. This condition — the strengthened Peirce’s principle — rules out high S.S.-sensitivity but not high L.- S.- and R.S.- sensitivity, and it is not implied by low S.S.-sensitivity. It is clear that, in a universe obeying the strengthened Peirce’s principle, a system can achieve a degree of intelligence by recognizing patterns and assuming that they will continue.
This idea makes sense no matter how complexity is defined; it relies solely on the definition of
pattern. But if one restricts attention to Turing machines, and considers complexity to mean KCS
complexity, then the present approach to induction becomes extremely similar to the "classical"
proposal of Solomonoff (1964). His paper, "A Formal Theory of Induction," was one of the three
original sources of the KCS complexity. His essential idea was a mathematical form of Occam’s
razor: the simplest explanation is the one most likely to be correct. He took this as a given,
defined an explanation of a binary sequence as a program computing it, and went on to construct
the KCS complexity as a measure of simplicity.