The branch of mathematics known as probability theory provides one way of making

inferences regarding uncertain propositions. But it is not a priori clear that it is the only

reasonable way to go about making such inferences. This is important for psychology because it

would be nice to assume, as a working hypothesis, that the mind uses the rules of probability

theory to process its perceptions. But if the rules of probability theory were just an arbitrary

selection from among a disparate set of possible schemes for uncertain inference, then there

would be little reason to place faith in this hypothesis.

Historically, most attempts to derive general laws of probability have been "frequentist" in

nature. According to this approach, in order to say what the statement "the probability of X

occurring in situation E is 1/3" means, one must invoke a whole "ensemble" of situations. One

must ask: if I selected an situation from among an ensemble of n situations "identical" to E, what

proportion of the time would X be true? If, as n tended toward infinity, this proportion tended

toward 1/3, then it would be valid to say that the probability of X occurring in situation E is 1/3.

In some cases this approach is impressively direct. For instance, consider the proposition:

"The face showing on the fair six-sided die I am about to toss will be either a two or a three".

Common sense indicates that this proposition has probability 1/3. And if one looked at a large

number of similar situations — i.e. a large number of tosses of the same die or "identical" dice —

then one would indeed find that, in the long run, a two or a three came up 1/3 of the time.

But often it is necessary to assign probabilities to unique events. In such cases, the frequency

interpretation has no meaning. This occurs particularly often in geology and ecology: one wishes

to know the relative probabilities of various outcomes in a situation which is unlikely ever to

recur. When the problem has to do with a bounded region of space, say a forest, it is possible to

justify this sort of probabilistic reasoning using complicated manipulations of integral calculus.

But what is really required, in order to justify the generalapplication of probability theory, is

some sort of proof that the rules of probability theory are uniquely well-suited for probable

inference.

Richard Cox (1961) has provided such a proof. First of all, he assumes that any possible rule

for assigning a "probability" to a proposition must obey the following two rules:

The probability of an inference on given evidence determines the probability of its

contradictory on the same evidence (p.3)

The probability on given evidence that both of two inferences are true is determined by their

separate probabilities, one on the given evidence, the other on this evidence with the additional

assumption that the first inference is true (p.4)

The probability of a proposition on certain evidence is the probability that logically should be

assigned to that proposition by someone who is aware only of this evidence and no other

evidence. In Boolean notation, the first of Cox’s rules says simply that if one knows the

probability of X on certain evidence, then one can deduce the probability of -X on that same

evidence without using knowledge about anything else. The second rule says that if one knows

the probability of X given certain evidence E, and the probability that Y is true given EX, then

one can deduce the probability that Y is true without using knowledge about anything else.

These requirements are hard to dispute; in fact, they don’t seem to say very much. But their

simplicity is misleading. In mathematical notation, the first requirement says that P(XY%E)=

F[(X%E),(Y%XE)], and the second requirement says that P(-X%E)=f[P(X%E)], where F and f

are unspecified functions. What is remarkable is that these functions need not remain

unspecified. Cox has shown that the laws of Boolean algebra dictate specific forms for these

functions.

For instance, they imply that G[P(XY%E)] = CG[P(X%E)]G[P(Y%XE)], where C is some

constant and G is some function. This is almost a proof that for any measure of probability P,

P(XY%E)=P(X%E)P(Y%XE). For if one sets G(x)=x, this rule is immediate. And, as Cox points

out, if P(X%E) measures probability, then so does G[P(X%E)] — at least, according to the two

axioms given above. The constant C may be understood by setting X=Y and recalling that

XX=X according to the axioms of Boolean algebra. It follows by simple algebra that C =

G[P(X%XE)] — i.e., C is the probability of X on the evidence X, the numerical value of

certainty. Typically, in probability theory, C=1. But this is a convention, not a logical

requirement.

As for negation, Cox has shown that if P(X)=f[P(-X)], Boolean algebra leads to the formula

Xr+[f(X)]r=1. Given this, we could leave r unspecified and use P(X)r as the symbol of

probability; but, following Cox, let us take r=1.

Cox’s analysis tells us in exactly what sense the laws of probability theory are arbitrary. All

the laws of probability theory can be derived from the rules P(X%E)=1-P(-X%E),

P(XY%E)=P(X%E)P(Y%XE). And these rules areessentially the only ways of dealing with

negation and conjunction that Boolean algebra allows. So, if we accept Boolean algebra and

Cox’s two axioms, we accept probability theory.

Finally, for a more concrete perspective on these issues, let us turn to the work of Krebs,

Kacelnik and Taylor (1978). These biologists studied the behavior of birds (great tits) placed in

an aviary containing two machines, each consisting of a perch and a food dispenser. One of the

machines dispenses food p% of the times that its perch is landed on, and the other one dispenses

food q% of the times that its perch is landed on. They observed that the birds generally visit the

two machines according to the optimal strategy dictated by Bayes’ rule and Laplace’s Principle of

Indifference — a strategy which is not particularly obvious. This is a strong rebuttal to those who

raise philosophical objections against the psychological use of probability theory. After all, if a

bird’s brain can use Bayesian statistics, why not a human brain?

BAYES’ RULE

Assume that one knows that one of the propositions Y1,Y2,…,Yn is true, and that only one of

these propositions can possibly be true. In mathematical language, this means that the collection

{Y1,…,Yn) is exhaustive and mutually exclusive. Then, Bayes’ rule says that

P(Yn)P(X%Yn)

P(Yn%X) = %%%%%%%%%%%%%%%%

P(Y1)P(X%Y1)+…+P(Yn)P(X%Yn)

In itself this rule is unproblematic; it is a simple consequence of the two rules of probable

inference derived in the previous section. But it lends itself to controversial applications.

For instance, suppose Y1 is the event that a certain star system harbors intelligent life which is

fundamentally dissimilar from us, Y2 is the event that it harbors intelligent life which is

fundamentally similar to us, and Y3 is the event that it harbors no intelligent life at all. Assume

these events have somehow been precisely defined. Suppose that X is a certain sequence of radio

waves which we have received from that star system, and that one wants to compute P(Y2%X):

the probability, based on the message X, that the system has intelligent life which is

fundamentally similar to us. Then Bayes’ rule applies: {Y1,Y2,Y3} is exhaustive and mutually

exclusive. Suppose that we have a good estimate of P(X%Y1), P(X%Y2), and P(X%Y3): the

probability that an intelligence dissimilar to us would send out message X, the probability that an

intelligence similar to us would send out message X, and the probability that an unintelligent star

system would somehow emit message X. But how do we know P(Y1), P(Y2) and P(Y3)?

We cannot deduce these probabilities directly from the nature of messages received from star

systems. So where does P(Yi%X) come from? This problem,at least in theory, makes the

business of identifying extraterrestrial life extremely tricky. One might argue that it makes it

impossible, because the only things we know about stars are derived from electromagnetic

"messages" of one kind or another — light waves, radio waves, etc. But it seems reasonable to

assume that spectroscopic information, thermodynamic knowledge and so forth are separate from

the kind of message-interpretation we are talking about. In this case there might be some kind of

a priori physicochemical estimate of the probability of intelligent life, similar intelligent life, and

so forth. Carl Sagan, among others, has attempted to estimate such probabilities. The point is that

we need some kind of prior estimate for the P(Yi), or Bayes’ rule is useless here.

This example is not atypical. In general, suppose that X is an effect, and {Yi} is the set of

possible causes. Then to estimate P(Y1%X) is to estimate the probability that Y1, and none of the

other Yi, is the true cause of X. But in order to estimate this using Bayes’ rule, it is not enough to

know how likely X is to follow from Yi, for each i. One needs to know the probabilities P(Yi) —

one needs to know how likely each possible cause is, in general.

One might suppose these problems to be a shortcoming of Bayes’ rule, of probability theory.

But this is where Cox’s demonstration proves invaluable. Any set of rules for uncertain reasoning

which satisfy his two simple, self-evident axioms — must necessarily lead to Bayes’ rule, or

something essentially equivalent with a few G’s and r’s floating around. Any reasonable set of

rules for uncertain reasoning must be essentially identical to probability theory, and must

therefore have no other method of deducing causes from effects than Bayes’ rule.

The perceptive reader might, at this point, accuse me of inconsistency. After all, it was

observed above that quantum events may be interpreted to obey a different sort of logic. And in

Chapter 8 I raised the possibility that the mind employs a weaker "paraconsistent" logic rather

than Boolean logic. How then can I simply assume that Boolean algebra is applicable?

However, the inconsistency is only apparent. Quantum logic and paraconsistent logic are both

weaker than Boolean logic, and they therefore cannot not lead to any formulas which are not also

formulas of Boolean logic: they cannot improve on Bayes’ rule.

So how do we assign prior probabilities, in practice? It is not enough to say that it comes

down to instinct, to biological programming. It is possible to say something about how this

programming works.

THE PRINCIPLE OF INDIFFERENCE

Laplace’s "Principle of Indifference" states that if a question is known to have exactly n

possible answers, and these answers are mutually exclusive, then in the absence of any other

knowledge one should assume each of these answers to have probability 1/n of being correct.

For instance, suppose you were told that on the planet Uxmylarqg, thepredominant intelligent

life form is either blue, green, or orange. Then, according to the Principle of Indifference, if this

were the only thing you knew about Uxmylargq, you would assign a probability of 1/3 to the

statement that it is blue, a probability of 1/3 to the statement that it is green, and a probability of

1/3 to the statement that it is orange. In general, according to the Principle of Indifference, if one

had no specific knowledge about the n causes {Y1,…,Yn} which appear in the above formulation

of Bayes’ rule, one would assign a probability P(Yi)=1/n to each of them.

Cox himself appears to oppose the Principle of Indifference, arguing that "the knowledge of a

probability, though it is knowledge of a particular and limited kind, is still knowledge, and it

would be surprising if it could be derived from… complete ignorance, asserting nothing". And in

general, that is exactly what the Principle of Indifference does: supplies knowledge from

ignorance. In certain specific cases, it may be proved to be mathematically correct. But, as a

general rule of uncertain inference, it is nothing more or less than a way of getting something out

of nothing. Unlike Cox, however, I do not find this surprising or undesirable, but rather exactly

what the situation calls for.

Kaynak: A New Mathematical Model of Mind

belgesi-958

0 kişi bu belgeyi faydalı buldu

0 kişi bu belgeyi faydalı buldu