# Multilevel Optimization

The basic principles of multilevel optimization were enounced in my Ph.D. thesis (Goertzel,
1989). There I gave experimental and theoretical results regarding the performance of a number
of specific algorithms operating according to the multilevel philosophy. Shortly after completing this research, however, I was surprised to find that the basic idea of the multilevel philosophy had been proposed by the sociologist Etzione (1968), in his Adaptive Society, as a method for optimizing the social structure. And a few months later I became aware of the strong similarity between multilevel optimization and the "discrete multigrid" method of Achi Brandt (1984) (who introduced the term "multilevel" into numerical analysis). Brandt’s ideas were introduced in the context of spin-glass problems like those described above. These parallels indicate how extremely simple and natural the idea is.

The first key concept is that the search for an optimum is to be conducted on a finite number
of "levels", each one determined by a certain characteristic distance. If the levels are denoted
1,2,…,L, the corresponding distances will be denoted h1,…,hL, and we shall adopt the convention that hi<hi+1. The multilevel philosophy assumes the existence of some method of "search" which finds an optimum value "about a point x on level i." There are many ways of executing such search. For instance, one may execute Monte Carlo search over the sphere of radius hi about x. Or one may execute Monte Carlo search over the surface of the sphere of radius hi about x. Such choices constitute specific multilevel methods operating within the framework of the multilevel philosophy. The multilevel philosophy has to do not with the nature of the searches but with the relation between searches executed on various levels.

A method operating within the multilevel philosophy may or may not incorporate a "zero
level," a local optimization method. First let us consider thecase L=1, with a zero level. In this case the concept is as follows. Given an initial guess x0, first execute the local optimization method is executed at this point. When the local optimization routine stops (having found a local extremum), stops proceeding fast enough (according to some preassigned threshold), or finishes a preassigned number of steps at some point w0, then search on level 1 is executed about w0,yielding a new point z0. Local optimization is then executed about z0, until it is halted by one of the three criteria, yielding a new point y0. Next, f(y0) is compared with f(x0). If f(y0) is better than f(x0), then the entire procedure is begun from y0; i.e. x0 is set equal to y0 and the algorithm is restarted. But if f(x0) is better, the program is terminated; x0 is the "answer." The idea is to avoid getting stuck in a shallow local optimum, or getting stuck crawling up an extremely gentle slope, by "jumping" away from the optimum by the nonlocal search on level 1.
If no zero level were implemented, and the local optimization routine in the above description
were replaced with the identity mapping, one would still have a viable optimization method,
which we shall call the one-level method. If h1 is very small, then the one-level method is a
general-purpose local optimization method. In fact, in the case of a Boolean function one may
take h1=1 (Hamming distance) and take the level-1 search to be an exact search on the surface of
the sphere of radius 1 (there is no interior but the center). One then has the standard discrete
steepest-descent method. And, in the continuous case, if one takes the level-1 search method to
be a Monte Carlo search on the surface of the sphere of radius h1, then one has a simple,
unoriginal approach to steepest-descent optimization which is probably as good as anything else
for local optimization of functions with extremely "rugged" graphs.

Next, consider the case L=i, i>1. Here, given an initial guess x0, we first execute the algorithm for L=i-1 about this point. When the L=i-1 routine stops (having found an "answer"), stops proceeding fast enough (according to some preassigned threshold), or finishes a preassigned number of steps at some point w0, then search on level i is executed about w0, yielding a new point z0. The L=i-1 routine is then executed about z0, until it is halted by one of the three criteria, yielding a new point y0. Next, f(y0) is compared with f(x0). If f(y0) is better than f(x0), then the entire L=i procedure is begun from y0; i.e. x0 is set equal to y0 and the algorithm is restarted. But if f(x0) is better, the program is terminated; x0 is the "answer."  For L=2, this procedure, if it has a zero level, first seeks a local optimum, then seeks to jump out of it by searching on level 1, and then seeks to jump out of the result of this jumping-out by searching on level 2. L=2 without a zero level is the same as L=1 with the one-level method as a zero-level.

Similarly, the L=i procedure seeks to jump out of the result of jumping out of the result of
jumping out of… the result of jumping out of the result of the lowest level.

The following instance may give an heuristic conception of the crux of the multilevel
philosophy. For simplicity, we assume no zero level, and we assume the first of the three criteria for stopping search: search on level i is stoppedonly when an "answer" on level i-1 is found. The same example may just as easily be applied to the other cases.
A SIMPLE EXAMPLE
Consider a function which maps a numerical value to each house in the world, and suppose a
person is trying to find the house with the highest number. If the distribution of numbers is
totally random, it doesn’t matter what order he checks the various houses in. But what if there is some intrinsic, perhaps subtle, structure to it? What does the multilevel philosophy tell him to do?
Starting from a randomly selected house, he should first check all houses on that block and see
which one has the highest number. Then he should check the neighboring block in the direction
of this optimal house. If no house on that block is better, he should call the best house he’s found so far his block-level optimum. But if some house on that block is better, then he should proceed to check the neighboring block in the direction of this new optimal house. And so on, until he finds a block-level optimum.
Once he finds a block-level optimum, he should then take a rough survey of the town in which
the block sits, and make a guess as to which areas will be best (say by the Monte Carlo method).
He should pick a block in one of the areas judged best and execute block-level search, as
described above, from this block, and so on until he reaches a new block-level optimum. Then he
should compare the two block-level optima and call the best of them his tentative town-level
optimum.
Then he should proceed to the town in the direction of this optimum and there execute town-
level optimization as described above. He should compare his two tentative town-level optima
and, if the old one is better, call it his town-level optimum. But if the new one is better, then he should proceed to the neighboring town in its direction and locate a new tentative town-level optimum. And so on, until he obtains a town-level optimum.
Then he should make a rough survey of the county in which this town sits, and make a guess
as to which areas will be best (say by the Monte Carlo method). He should pick a town in one of
the areas judged best and execute town-level search, as described above, from this town, and so
on until he reaches a new town-level optimum. Then he should compare the two town-level
optima and call the best of them his tentative county-level optimum.

Then he should proceed to the county in the direction of this optimum and there execute
county-level optimization as described above. He should compare his two tentative county-level
optima and, if the old one is better, call it his county-level optimum. But if the new one is better,then he should proceed to the neighboring county in its direction and locate a new tentativecounty-level optimum. And so on, until he obtains a county-level optimum. Applying the same logic, he could obtain state-wide, nation-wide and global optima…

belgesi-935