transformations and learning

[From: Bruce Nevin (Mon 930111 11:11:11)]

The following outline of ideas from a fellow BBNer, Al Boulanger,
seems to me like a fruitful direction to look for certain aspects
of higher level control and reorganization as involved in
learning at those levels. The header info that comes first
indicates where I copied this from.

(OK, I cheated on the time stamp above, but it was only a few
minutes short of what appears there. My not-quite-11-year-old
would love it.)

        Bruce

···

-=+=-=+=-=+=-=+=-=+=-=+=-=+=-=+=-=+=-=+=-=+=-=+=-=+=-=+=-=+=-=+=-=+=-=+=-

Date: Mon, 04 Jan 93 11:37:55 -0800
From: Michael Pazzani <pazzani@ics.uci.edu>
Message-ID: <9301041146.aa27802@q2.ics.uci.edu>

                 Machine Learning List: Vol. 5, No. 1
                        Monday, January 4, 1993

------------------------------

Subject: Minimum Description Length & Transformations in Machine Learning
From: aboulang@bbn.COM
Date: Sat, 2 Jan 93 19:00:10 EST

Minimum Description Length & Transformations in Machine Learning

Or, Is there a Principle of Least Action for Machine Learning?

In this short note I want to posit that MDL-like methodologies will
become the unifying "Least Action Principles" of machine learning.
Furthermore, machine learning architectures will evolve to include a
fundamental capability for doing coordinate transformations and this
capability will be intimately tied to the use of MDL-like
methodologies in Machine Learning.

By MDL-like methodologies I mean the use information-theoretic metrics
on the results of any machine learning algorithm in its generalization
phase. This metric is used a a decision criterion for over training
by comparing the MDL-like metric of the results or the machine
learning algorithm against the data itself. MDL-like methodologies
are applicable to supervised and unsupervised learning. What I want to
mean by the term "MDL-like" is that there is an applicable body of
work in this area -- including the work of Wallace, Akaike and
Rissanen. It is possible to use MDL-like metrics in the generation
phase as well.

Transformations and Machine Learning

Many paradigmnamic problems in machine learning become
"embarrassingly" simple under straightforward coordinate
transformations. For instance, the two spirals problem becomes two
simple lines under a polar coordinate transformation. Much of the
activity of a physicist is in examination of appropriate coordinate
system hosting of the problem to exploit symmetries of the problem. I
posit that at least one phase of any machine learning system should
include a search for appropriate coordinate system hosting.

These transformations come in many different colors. For example,
temporal differences is a relativising transformation in time
coordinates. Another example is the growing use of wavelets for
time-frequency features.

A significant contributor to the complexity of the description of a
problem is its chosen coordinate-system hosting. Coordinate
transformations can be of two types: local and global. An example of a
global transformation is the aforementioned polar hosting for the two
spirals problem. The Fukashima network makes use of local
transformations for robust pattern recognition. MDL can be used as
the selection criteria in the transformation search.

  MDL as a Least Action Principle for Machine Learning

MDL-like methods holds a promise to be a unifying principle in machine
learning -- much like Lagrangian methods that make use of action and
its minimization is *the* unifying approach in physics, cutting across
classical physics, relativistic physics, and quantum mechanics.
MDL-like metrics are a type of *action* for machine learning. (In fact
for certain types of search in machine learning, Lagrangian optimization
can be used.)

(Recent work in machine vision at MIT has suggested the use of MDL as
a principle for 3-d object recognition and disambiguation. It is
posited that what is perceived is related to a MDL description of the
3d-scene. By the way, who is doing this work?)

There are a couple of long-standing conceptual issues in machine learning:

  The relationship between learning methodologies - supervised,
  unsupervised, reinforcement learning, etc. Somehow, one would like a
  unifying framework for all of them. The fact that MDL-like methods
  can be used in several methodologies means that it could help in
  building such a framework.

  The relationship between optimization and machine learning. MDL-like
  metrics are posited to be the *general* optimization criterion for
  machine learning.

MDL has broad applicability in machine learning. It can be used to
guide search in both unsupervised and supervised learning. It can be
used as the common optimization criterion for "multi-algorithm machine
learning systems". Finally it can be used to tie the search in feature
space with that of the search for coordinate system hosting.

Seeking a higher form for machine learning,
Albert Boulanger
aboulanger@bbn.com

[From Rick Marken (930111.0900)]

Bruce Nevin (Mon 930111 11:11:11) --

The following outline of ideas from a fellow BBNer, Al Boulanger,
seems to me like a fruitful direction to look for certain aspects
of higher level control and reorganization as involved in
learning at those levels.

Having read it I must ask "why"?

Best

Rick