9.0 The Perceptual Hierarchy

   In accordance with the philosophy outlined in Chapter 5, I define perception as pattern
recognition. Pattern recognition is, of course, an extremely difficult optimization problem. In
fact, the task of recognizing all the patterns in an arbitrary entity is so hard that no algorithm can
solve it exactly — this is implied by Chaitin’s (1987) algorithmic-information-theoretic proof of
Godel’s Theorem. As usual, though, exact solutions are not necessary in practice. One is, rather,
concerned with finding a reasonably rapid and reliable method for getting fairly decent
   I propose that minds recognize patterns according to a multilevel strategy. Toward this end, I
hypothesize a hierarchy of perceptual levels, each level recognizing patterns in the output of the
level below it, and governed by the level immediately above it. Schematically, the hierarchy may
be understood to extend indefinitely in two directions (Fig. 4). It will often be convenient to,
somewhat arbitrarily, pick a certain level and call it the zero level. Then, for n= …-3,-2,-1,0,
1,2,3,…, the idea is that level n recognizes patterns in the output of level n-1, and also
manipulates the pattern-recognition algorithm of level n-1.
   Physically speaking, any particular mind can deal only with a finite segment of this hierarchy.
Phenomenologically speaking, a mind can never know exactly how far the hierarchy extends in
either direction.
   One may analyze consciousness as a process which moves from level to level of the
perceptual hierarchy, but only within a certain restricted range. If the zero level is taken to
represent the "average" level of consciousness, and consciousness resides primarily on levels
from -L to U, then the levels below
-L represent perceptions which are generally below conscious perception. And,on the other hand,
the levels above U represent perceptions that are in some sense beyond conscious perception:
too abstract or general for consciousness to encompass.
   Consciousness can never know how far the hierarchy extends, either up or down. Thus it can
never encounter an ultimate physical reality: it can never know whether a perception comes from
ultimate reality or just from the next level down.
   Perception and motor control might be defined as the link between mind and reality. But this
is a one-sided definition. Earlier we defined intelligence by dividing the universe into an
organism and an environment. From this "God’s-eye" point of view an organism’s perceptual and
motor systems are the link between that organism and its environment. But from the internal
point of view, from the point of view of the conscious organism, there can be no true or ultimate
reality, but only the results of perception.
   Therefore, in a sense, the result of perception is reality; and the study of perception is the
study of the construction of external reality. One of the aims of this chapter and the next is to
give a model of perception and motor control that makes sense from both points of view — the
objective and the subjective, the God’s-eye and the mind’s-eye, the biological and the
   Fodor (1983) has proposed that, as a general rule, there are a number of significant structural
differences between input systems and central processing systems . He has listed nine
properties which are supposed to be common to all the input systems of the human brain: the
visual processing system, the auditory processing system, the olfactory and tactile processing
systems, etc.:
1. Input systems are domain specific: each one deals only with a certain
specific type of problem.
2. Input systems operate regardless of conscious desires; their operation is mandatory.
3. The central processing systems have only limited access to the
representations which input systems compute.
4. Input systems work rapidly.
5. Input systems do most of their work without reference to what is going on
in the central processing systems, or in other input systems.
6. Input systems have "shallow" output, output which is easily grasped by central processing
7. Input systems are associated with fixed neural architecture.
8. The development of input systems follows a certain characteristic pace
and sequence.
   I think these properties are a very good characterization of the lower levelsof the perceptual
hierarchy. In other words, it appears that the lower levels of the perceptual hierarchy are strictly
modularized. Roughly speaking, say, levels -12 to -6 might be as depicted in Figure 5, with the
modular structure playing as great a role as the hierarchical structure.
   If, say, consciousness extended from levels -3 to 3, then it might be that the modules of levels
-12 to -6 melded together below the level of consciousness. In this case the results of, say, visual
and auditory perception would not present themselves to consciousness in an entirely
independent way. What you saw might depend upon what you heard.
   A decade and a half ago, Hubel and Wiesel (1988) demonstrated that the brain possesses
specific neural clusters which behave as processors for judging the orientation of line segments.
Since then many other equally specific visual processors have been found. It appears that Area
17 of the brain, the primary visual cortex, which deals with relatively low-level vision
processing, is composed of various types of neuronal clusters, each type corresponding to a
certain kind of processing, e.g. line orientation processing.
   And, as well as perhaps being organized in other ways, these clusters do appear to be
organized in levels. At the lowest level, in the retina, gradients are enhanced and spots are
extracted — simple mechanical processes. Next come simple moving edge detectors. The next
level, the second level up from the retina, extracts more sophisticated information from the first
level up — and so on. Admittedly, little is known about the processes two or more levels above
the retina. It is clear (Uhr, 1987), however, that there is a very prominent hierarchical structure,
perhaps supplemented by more complex forms of parallel information processing. For instance,
most neuroscientists would agree that there are indeed "line processing" neural clusters, and
"shape processing" neural clusters, and that while the former pass their results to the latter, the
latter sometimes direct the former (Rose and Dobson, 1985).
   And there is also recent evidence that certain features of the retinal image are processed in
"sets of channels" which proceed several levels up the perceptual hierarchy without intersecting
each other — e.g. a set of channels for color, a set of channels for stereoposition, etc. This is
modular perception at a level lower than that considered by Fodor. For instance, Mishkin et al
(1983) have concluded from a large amount of physiological data that two major pathways pass
through the visual cortex and then diverge in the subsequent visual areas: one pathway for color,
shape and object recognition; the other for motion and spatial interrelations. The first winds up in
the inferior temporal areas; the second leads to the inferior parietal areas.
    And, on a more detailed level, Regan (1990) reviews evidence for three color channels in the
fovea, around six spatial frequency channels from each retinal point, around eight orientation
channels and eight stereomotion channels, two orthree stereoposition channels, three flicker
channels, two changing-size channels, etc. He investigates multiple sclerosis by looking at the
level of the hierarchy — well below consciousness — at which the various sets of channels
    If one needs to compute the local properties of a visual scene, the best strategy is to hook up a
large parallel array of simple processors. One can simply assign each processor to a small part of
the picture; and connect each processor to those processors dealing with immediately
neighboring regions. However, if one needs to compute the overall global properties of visual
information, it seems best to supplement this arrangement with some sort of additional network
structure. The pyramidal architecture (Fig. 6) is one way of doing this.
    A pyramidal multicomputer is composed of a number of levels, each one connected to the
levels immediately above and below it. Each level consists of a parallel array of processors, each
one connected to 1) a few neighboring processors on the same level, 2) one or possibly a few
processors on the level immediately above, 3) many processors on the level immediately below.
Each level has many fewer processors than the one immediately below it. Often, for instance, the
number of processors per level decreases exponentially.
    Usually the bottom layer is vaguely retina-like, collecting raw physical data. Then, for
instance, images of different resolution can be obtained by averaging up the pyramid: assigning
each processor on level n a distinct set of processors on level n-1, and instructing it to average
the values contained in these processors.
    Or, say, the second level could be used to recognize edges; the third level to recognize shapes;
the fourth level to group elementary shapes into complex forms; and the fifth level to compare
these complex forms with memory.
    Stout (1986) has proved that there are certain problems — such as rotating a scene by pi
radians — for which the pyramidal architecture will perform little better than its base level would
all by itself. He considers each processor on level n to connect to 4 other processors on level n, 4
processors on level n-1, and one processor on level n+1. The problem is that, in this arrangement,
if two processors on the bottom level need to communicate, they may have to do so by either 1)
passing a message step by step across the bottom level, or 2) passing a message all the way up to
the highest level and back down.
    However, Stout also shows that this pyramidal architecture is optimal for so-called "perimeter-
bound" problems — problems with nontrivial communication requirements, but for which each
square of s2 processors on the base level needs to exchange only O(s) bits of information with
processors outside that square. An example of a perimeter-bound problem is labeling all the
connected components of an image, or finding the minimum distance between one component
and another.
   In sum, it seems that strict pyramidal architectures are very good at solving problems which
require processing that is global, but not too global. When a task requires an extreme amount of
global communications, a parallel architecture with greater interconnection is called for — e.g. a
"hypercube" architecture.
   Thinking more generally, Levitan et al (1987) have constructed a three-level "pyramidal"
parallel computer for vision processing. As shown in Figure 7, the bottom level deals with
sensory data and with low-level processing such as segmentation into components. The
intermediate level takes care of grouping, shape detection, and so forth; and the top level
processes this information "symbolically", constructing an overall interpretation of the scene.
The base level is a 512×512 square array of processors each doing exactly the same thing to
different parts of the image; and the middle level is composed of a 64×64 square array of
relatively powerful processors, each doing exactly the same thing to different parts of the base-
level array. Finally, the top level contains 64 very powerful processors, each one operating
independently according to programs written in LISP (the standard AI programming language).
The intermediate level may also be augmented by additional connections, e.g. a hypercube
   This three-level perceptual hierarchy appears be an extremely effective approach to computer
vision. It is not a strict pyramidal architecture of the sort considered by Stout, but it retains the
basic pyramidal structure despite the presence of other processes and interconnections.
   In sum, it is fairly clear that human perception works according to a "perceptual hierarchy" of
some sort. And it is also plain that the perceptual hierarchy is a highly effective way of doing
computer vision. However, there is no general understanding of the operation of this hierarchy.
Many theorists, such at Uttal (1988), suspect that such a general understanding may be
impossible — that perception is nothing more than a largely unstructured assortment of very
clever tricks. In 1965, Hurvich et al made the following remark, and it is still apt: "the reader
familiar with the visual literature knows that this is an area of many laws and little order"
   I suggest that there is indeed an overall structure to the process. This does not rule out the
possibility that a huge variety of idiosyncratic tricks are involved; it just implies that these tricks
are not 100% of the story. The structure which I will propose is abstract and extremely general;
and I am aware that this can be a limitation. As Uttal has observed,
      Perceptual psychophysics has long been characterized by experiments specific to a
microscopically oriented theory and by theories that either deal with a narrowly defined data set
at one extreme or, to the contrary, a global breadth that is so great that data are virtually
irrelevant to theirconstruction. Theories of this kind are more points of view than analyses.
Uttal would certainly put the theory given here in the "more point of view than analysis"
category. However, it seems to me that, if the gap between psychophysical theory and data is
ever to be bridged, the first step is a better point of view. And similarly, if the gap between
biological vision and computer vision is ever to be closed, we will need more than just superior
technology — we will need new, insightful general ideas. Therefore I feel that, at this stage, it is
absolutely necessary to study the abstract logic of perception — even if, in doing so, one is
guided as much by mathematical and conceptual considerations as by psychophysical or other
Kaynak: A New Mathematical Model of Mind

Belgeci , 2280 belge yazmış

Cevap Gönderin