7.2 Implications of Structurally Associative Memory

   Let us now return to the problem of how a STRAM is to maintain its approximate network-of-
emergence structure. One way to approach this optimization problem is via the multilevel
methodology (Goertzel, 1989). This application of the multilevel methodology bears a strong
similarity to Achi Brandt’s (1985) multilevel algorithm for the Ising spin problem.
   The first level would be the following procedure:
1) assign nodes to any newly-introduced functions, and single out a number of
pre-existing nodes, in such a way that the nodes assigned and selected are
approximately evenly distributed across the network (so that no two are too
close to each other). Call these functions xi and their nodes N(xi).
2) switch the node of each of these functions xi with the node of one of the
functions y with which it shares a pattern — that is, assign xi node N(y) and y node N(xi) — and
see if, according to the new arrangement, the total amount of pattern which the network indicates
in xi and y is greater. If so, let N(xi)=N(y) and let N(y)=N(xi) — that is, make the switch
permanent — and apply Step 2 to xi again. If not, proceed to one of the other functions y with
which xi shares a pattern (say, proceeding through the xi in order of decreasing intensity of
shared pattern). Once all the functions with which xi shares more than e (some small fixed
number) worth of pattern have been exhausted, exit Step 2.
This algorithm moves functions through the network toward their "proper position." The
problem is that, even if xi is not in its optimal position, its position could well be better for it than
the positions of its neighbors. Ideally, one might like to try each xi out in every possible position;
and although this is not possible, one may indeed improve upon the "neighbors-only" approach.
   Following the multilevel methodology, one could seek to find the optimum node for xi among
those nodes at a distance between h1 and h2 from xi in the network (where the distance between
two nodes is measured by the number of links separating them). One could attempt this by the
Monte Carlo method,randomly seeking to switch xi with functions in this region, or one could
attempt this by randomly beginning the neighbors-only search given above from points in this
   And if one found a new home for xi in this region, one could execute the neighbors-only
search from the new home. And from the answer which this yielded, one could execute the
second-level search. And one could repeat this entire process according on an arbitrary number
of levels, according to the basic framework outlined in Chapter 2.
   Of course, this is only one possibility; the multilevel philosophy could be applied in many
different ways, and there are many other approaches to optimization. The important point is that,
by some specially-tailored optimization method, the network must continually reorganize itself
so as to better approximate the network of emergence structure, and so as to integrate new data in
a manner which maintains this structure. It would seem to be completely impossible to
determine, at this stage, the actual method by which the human brain reorganizes its memory
   It might seem that no organism could afford to continually subject its memory to such a risky,
speculative optimization algorithm as that sketched above. It could be, of course, that there exists
some optimization algorithm which is substantially more effective than those presently known.
However, as emphasized in Chapter 2, most researchers doubt if there will ever exist a rapid,
highly accurate algorithm for the solution of difficult optimization problems. In the context of
associative memory, rapidity is of the essence, so it is probably true that rough approximate
answers must suffice.
   If one were designing a brain, one way of partially eliminating the risk involved in adjusting
memory according to a highly approximate algorithm would be to provide it with a large number
of structurally associative memories, each storing largely the same functions. In fact, it has often
been observed that biological memory is holistic in the sense that it appears to "store almost
everything almost everywhere". Any small section of the cortex appears to contain a large part of
the data which the cortex stores. This an empirical observation, validated repeatedly in the
laboratory (Hubel, 1988).
   It has been suggested that holistic memory is necessary because of the error-prone nature of
the brain’s components. This is almost certainly true. However, the present considerations
suggest another good reason: the nature of structurally associative memory seems to require that
memory structure be continually subjected to radical, experimental transformations. In order for
these transformations not to interfere too much with continual memory use, the mind must
incorporate some sort of "workspace" in which unsuccessful transformations may be tried and
discarded without global effect. In short: holistic memory may be necessary not only because of
the error-prone natureof the brain’s "hardware", but also because of the error-prone nature of
high-level mental process.
   In this context, let us return to the question of physical memory. A physical memory is,
obviously, not a weighted graph. Nonetheless we would like to call certain physical memories
structurally associative memories. Therefore it is necessary to assign to each physical memory M
a set of "naturally correspondent" structurally associative memories. We know of no completely
satisfactory way of doing this, but the following scheme is not unreasonable. It requires only that
we assign to each pair of elements (x,y) stored by M a distance DM(x,y) which measures the
difficulty of locating x in memory given that y has very recently been located. Let F be a
mapping which takes the set S of elements stored by M into the nodes of a graph; let L(x,y)
denote the number of links in the graph F(S) along the shortest path from(x) to F(y). Then the
accuracy with which F represents M may be defined by the average, over all (x,y), of %L(x,y)-
DM(x,y)%. This definition requires that, in a rough sense, distance in the memory M correspond
to distance in the graph F(S). Certainly there is more to be done in this direction, but the point is
that it is indeed possible to model any sort of memory as a graph, and hence to analyze any sort
of memory as structurally associative memory.
   Finally, we must now explore the manner in which a structurally associative memory may be
utilized by cognitive processes. In the notation of the definition of analogy given above, consider
that f and y have already been chosen. The process of selection will be discussed briefly a bit
later. Then, given x, an appropriate x% may be located by a simple process of "looking up" in
   In the case of structural analogy all that is required is to look for something which has a large
amount of pattern in common with x. Specifically, a list must be made of the significant patterns
in x, and then the other functions in which these functions are also patterns must be searched,
with the objective of locating those which have a large number of these patterns in common.
This may be a large search, and it will have to be approximate — e.g. a Monte Carlo search could
be implemented, skipping over at random some of the neighbors. The point is that the making of
the list of patterns in x, and of neighbors which may also possess these patterns, is merely a
matter of looking at the actual graph of the network. Of course, this method will only be accurate
to the extent that the STRAM approximates the network of emergence structure.
   In the case of contextual analogy, a list must be made of all the patterns which emerge
between x and v, and then a search of the neighbors of thesepatterns must be made, until a
function x% is located such that many of the same patterns emerge between it and some other
entity v%.
   If f is not the identity mapping, then things are slightly more involved. One knows only that
f(x%) is close to x in the network; one knows nothing about x% itself. So, in general, a list must
be made of all the patterns which emerge in x or between x and w, and then a search of the
neighbors of these patterns must be made, until an entity z is located such that many of the same
patterns emerge between it and some other entity w%. Then x% may be set equal to f-1(z). This
involves the potentially difficult computation of f-1, and it is generally much more difficult than
structural or contextual analogy. However, as suggested earlier, even speculative modeling
analogy may be useful, as in brainstorming.
   The location of x% is Step 1 of the general process of analogy. The next step is the recognition
of patterns in x%. This process may be decomposed into two stages: isolating the patterns
already "known" by STRAM to be part of the structure of x%, and finding new patterns not yet
"known" by STRAM. Of course, if f is the identity mapping, then it is trivial to locate all patterns
in x’ that have been identified by STRAM; in fact, this was already done to some degree of
approximation in the selection of x%. In this case the first stage is very easy to execute. But if f
is not the identity, then in order to find out what STRAM knows about x%, one must search
through STRAM for x% (remember,it is f(x%) which is near x, not necessarily x% itself). One
may treat this search as a minimization, over the space of all nodes of STRAM, of the function
d#(x,x%). It would not be wise to execute this minimization by local search (from each node
proceeding to that neighboring node which minimizes the function), because the structure of
STRAM is almost certainly imperfect; a focused Monte Carlo search, a simulated annealing
search, or a multilevel search would be much more effective.
   The second stage of Step 2, the location of new patterns in x%, will be implicitly addressed
later, when we discuss pattern recognition in general.
Kaynak: A New Mathematical Model of Mind

Belgeci , 2280 belge yazmış

Cevap Gönderin