Learning program

Bill_Powers1 · December 8, 2000, 6:11pm

[From Bill Powers (2000.12.08.0959 MST)]

Attached is a program called "furnace4.exe" which will run on a PC. It will
show up where attached files show up on your system. Use Start-Run-Browse
to find it (make sure the right drive is typed into the window, such as c:)
and double-click on it. The source code is appended at the end of this post.

This program is called Furnace4 for the reason that it started out as a
program to implement the furnace control system that was talked about some
time ago. But forget that association. It's now set up to show an organism
moving around in a cage looking for food (or maybe water). In the cage, a
rectangle on the right side of the screen, there are six randomly placed
circles, each containing a randomly-selected amount of food (or water)
between 1 and 100 units. When the organism is in motion (see later), it
moves in straight lines until it encounters a wall of the cage, after which
it turns to a new randomly-selected direction (one that leaves it still
moving inside the cage). I'll assume food is involved, in the following.

The organism is always producing repetitive actions which, when food is
present, will deliver it one unit of food for each action (fixed-ratio 1).
However, no food is delivered unless the organism is located inside one of
the green food circles.

When food is delivered, it bumps up the value of a controlled variable
(like stored enegy) by 5000 units. Between deliveries, the controlled
variable decreases in value by an amount determined by a smoothed random
disturbance which is always negative or zero. This is left over from the
furnace model in which variations in outside temperature affected the
inside temperature. It works just as well for a generic disturbance such as
energy losses caused by variable activities other than those shown on the
screen. This disturbance is plotted in white on the left side of the
screen; when it is going downward, it is getting larger (its negative
effect on the controlled quantity is increasing).

When the organism is inside a food circle that still contains some food, it
emits actions at a rate proportional to the error signal. Each action (such
as a lever press) is shown by a short vertical yellow line. The greater the
error signal, the faster the presses. The error is the difference between
the reference level for (let us say) stored energy and the actual amount of
stored energy that exists. The actual stored energy error is shown by a
trace which is normally green, but turns red when the negative-going error
is larger than a threshold amount of about -0.5 units. When the error is
large and actions are occurring rapidly, the negative error rises rapidly
toward zero, increasing only slightly between the big decreases due to food
delivered by the actions. As the error gets smaller, the actions slow down
and eventually we reach the point where the losses due to the disturbance
equal the amount gained from the actions. As the disturbance varies, of
course, the rate at which the actions vary also changes, so as to keep the
error reasonably small (near the white line representing zero error).

The speed with which the organism moves depends on the error when the error
is greater than 5 units. This is larger than the maximum error caused by
the largest amount of disturbance, so when the actions are successfully
producing food, the speed of movement remains at zero. However, when a food
circle runs out of food (the food circle turns red), the error begins to
increase in the negative direction, and it soon exceeds 5 units. At that
point the speed begins to increase in proportion to the error. The organism
moves in straight lines as described, bouncing off the walls of the cage at
random angles, and moving faster and faster as the stored-energy error
increases.

When the organism encounters a food circle containing food (green), its
actions produce food which causes the error to decrease rapidly. Since the
speed of movement is proportional to the error, the organism slows as the
error decreases. If the amount of error is brought below 5 units before the
organism leaves the food circle, motion stops and the actions continue to
provide food until the food runs out in that circle.

By these mechanisms, the organism moves into food circles, stops until it
has produced and eaten all the food in that circle, and then moves on to
the next circle that contains food, in a random search pattern.

When the last food circle is empty and all are red, the error gets larger
and larger and the speed keeps getting greater, up to a point. When the
error reaches 30 units, further increases begin to slow the speed. This is
a crude representation of what happens as the stored energy drops toward
zero. The error gets larger and the speed continues to drop, until the
speed goes to zero. A legend appears in the cage: "STARVED TO DEATH".

There are many details in this model that could be refined. The search
pattern could be made to work like the E. coli method of reorganization.
The food ingested could go into an actual energy store, which would be
depleted at some basic metabolic rate and an additional rate proportional
to the speed of movement (or the square of the speed). The behavior could
be multidimensional instead of consisting only of emitting a single action
at a variable rate. For greater realism, the rate of emitting actions could
be made on-off, so the organism is either not acting or is acting at some
maximum possible rate. I'm sure there are many other details that need
attention.

However, the basic principle I propose is illustrated here well enough, I
think, to make the point. The search pattern is driven by errors larger
than those that are found when there is food available. When food is
available, behavior is driven by the stored energy error in the normal
(lower) range, and it keeps the error smaller than the amount required to
initiate the search again. Increases in food ingestion rate always
_decrease_ the behavior rate, and never increase it, while food is being
produced. So this is clearly not a reinforcement model.

Appended below is the source code in Turbo Pascal. If you compile this
using your own copy of Turbo Pascal 7.0 on a fast machine, I recommend that
you ask me for the Turbo Pascal Library file, Turbo.tpl, which I have
modified as necessary to avoid a fatal error during initialization of the
"delay" function. Without this modified library file, no programs will run
on a machine with a clock speed faster than around 100 to 200 MHz. I hasten
to disclaim authorship of this fix; I found it on the web. I don't have a
fix for other versions of Turbo Pascal.

Enjoy.

Bill P.

Furnace4.exe (56 Bytes)

···

==========================================================================
program Furnace4;

uses dos, crt, graph, grutils;

const triglevel: real = 100.0;
maxplaces = 6;

type foodplacetype = record
                       x1,y1: real;
                       foodquantity: integer;
                     end;

var foodplace: array[1..maxplaces] of foodplacetype;
    contvar, per, ref, error, output, dist, timing: real;
    xx, timeoff, noff, count: integer;
    xmax, ymax, xmin, ymin, width, radius, foodradius: integer;
    first, failed,searching,acting: boolean;
    x,y,dx,dy,speed,dir: real;

procedure initfood;
var i: integer;
begin
for i := 1 to maxplaces do
with foodplace[i] do
foodquantity := random(100); {food between 0 and 100 units}
end;

procedure initprogram;
var i: integer;
begin
initgraphics;
noff := 10;
timeoff := noff;
timing := 0.0;
failed := false;
searching := false;
dist := 0.0;
contvar := 68.5;
ref := 70;
count := 100;
radius := 3;
width := 200;
speed := 1;
dir := pi/4.0;
xx := 0;
xmax := hsize - radius - 1; xmin := xmax - width - radius - 2;
ymin := 0; ymax := ymin + width + radius + 2;
rectangle(xmin, ymin,
           xmax+1+radius, ymax+1+radius);
x := 25.0; y := 5.0;
dx := 0.5; dy := 0.9;
foodradius := 20; { radius of food circle}
for i := 1 to maxplaces do
with foodplace[i] do
begin
  x1 := xmin + width * random; { locate places where food available}
  y1 := ymin + width * random;
end;
initfood;
end;

procedure showfoodplace;
var i: integer;
begin
for i := 1 to maxplaces do
with foodplace[i] do
begin
  if foodquantity > 0 then setcolor(lightgreen)
  else setcolor(lightred);
  circle(round(x1), round(y1), foodradius);
  setcolor(white);
end;
end;

function findfoodplace: boolean; {return true if in food circle}
var i: integer;
    found: boolean;
    xloc,yloc: real;
begin
xloc := xmin + x;
yloc := ymin + y;
found := false;
i := 1;
while (i <= maxplaces) and (not found) do
  begin
   with foodplace[i] do
   if not found then
   begin
    if (sqrt(sqr(xloc - x1) + sqr(yloc - y1)) < foodradius)
       and (foodquantity > 0) then
    begin
     found := true;
     dec(foodquantity); {reduce quantity of food available}
    end;
   inc(i);
  end;
  findfoodplace := found;
end;
end;

procedure search;
var newx,newy: real;
begin
  setcolor(black);
  circle(xmin + radius + round(x),ymin + radius + round(y),radius);
  newx := x + speed*cos(dir);
  newy := y + speed*sin(dir);
  while ((newx > width) or (newx < 1)
  or (newy > width) or (newy < 1)) do
  begin
   dir := 2*pi*random; {move in random direction after hitting edge}
   if dir > 2*pi then
   dir := dir - 2*pi;
   newx := x + speed*cos(dir);
   newy := y + speed*sin(dir);
  end;
  x := newx; y := newy;
  setcolor(white);
  circle(xmin + radius + round(x),ymin + radius + round(y),radius);
  showfoodplace;
end;

procedure control;
const lasterror: real = 0.0;
var errorchange: real;
begin
per := contvar;
lasterror := error;
error := 10*(ref - per);
if error < 50 then speed := error/50 {SLOW DOWN WHEN ERROR DECREASES}
else
  begin
   speed := 3.0 - error/50; { SLOW WHEN LOSING STORED NUTRIENTS}
   if speed <= 0 then
   begin
     outtextxy(round(xmin + width/4),
               round(ymin + width/2),
               'STARVED TO DEATH');
     outtextxy(hsize - 200, vsize - 20,'PRESS ANY KEY TO EXIT');
   end;
  end;
if speed > 3.0 then speed := 3.0;
if speed < 0.0 then speed := 0.0;
errorchange := errorchange + 0.1*(error - lasterror - errorchange);
if error < 0.0 then error := 0.0;
searching := (error > 5.0) {and (errorchange > 0.0)};
if searching then search;
timing := timing + error;
if timing >= triglevel then
  begin
   if not failed then output := 5000;
   timing := 0.0;
   setcolor(yellow);
   if not failed then line(xx div 10,vsize - 250, xx div 10, vsize - 300);
   failed := not findfoodplace;
  end
else output := 0.0;

contvar := contvar + 0.00005*(output - dist);
end;

procedure showdata;
var color: word;
    y: integer;
begin
if failed then color := lightred else color := lightgreen;
putpixel(xx div 10,vsize - round(50.0*(contvar - 65.0)), color);
putpixel(xx div 10, vsize - 250 + round(dist), white);
inc(xx);
if xx >= 10*(hsize - 210) then
  begin
   xx := 0;
   setviewport(0,0,xmin - 1, vsize - 1,true);
   clearviewport;
   setviewport(0,0,hsize - 1, vsize - 1,false);
   rectangle(xmin, 0,
           xmax+1+radius, ymax+1+radius);
   line(0,vsize - 250,xmin,vsize - 250);
  end;
end;

begin
  randomize;
  initprogram;
  line(0,vsize - 250,hsize - 210,vsize - 250);
repeat
  control;
  dist := dist + (3000.0*(random - 0.49) - dist)/300.0;
  if dist < 0.0 then dist := 0.0;
  showdata;
  delay(3);
until keypressed;
readkey;
closegraph;
end.

=============================================================================

Richard_Marken2 · December 8, 2000, 7:04pm

[From Rick Marken (2000.12.08.1100)]

Bill Powers (2000.12.08.0959 MST)

Attached is a program called "furnace4.exe" which will
run on a PC.

Wonderful. Looks good. I would make it so the organism spends
a bit less time in each circle. Great start though!

Best

Rick

···

--
Richard S. Marken Phone or Fax: 310 474-0313
MindReadings.com mailto: marken@mindreadings.com
www.mindreadings.com

Bill_Powers1 · December 9, 2000, 9:23am

[From Bill Powers (2000.12.09.0122 MST)]

Rick Marken (2000.12.08.1100)--

I would make it so the organism spends
a bit less time in each circle. Great start though!

The length of the stay depends on how much food is loaded into each food
circle during initialization. Search on "foodquantity" and you'll find the
place. Right now it's a random number between 0 and 100, set by
"random(100)". You could make it "random(25)" or anything else.

The program could be neatened up a lot. Parameters could be made adjustable
from the keyboard and so on. One interesting variation would be to add the
E. coli reorganization to the direction of movement. Another would be to
add "schedules of reinforcement."

One of the main points of this program is to show an underlying mechanism
that gives the same appearance that Bruce Abbott describes for an organism
first encountering a lever and learning to operate it for food. In this
model, there isn't much detail: simply being in a food circle is enough to
make some unnamed ongoing action sufficient to produce food. It's as though
the action is always taking place, but is simply being moved from one place
to another. In a way, that's a correct view; the rat initially operates the
lever just by doing what it does anywhere in the cage: nosing around,
rearing up off its front paws and stepping down again. I would guess it's
even clearer with a pigeon (Chris?) which naturally pecks around at all
kinds of things in an exploratory way.

What's missing is the acquisition of more and more specific actions. That
would require more specific variations in the parameters of behavior. I
noticed (in the videos that Bruce A. made) a tendency of some rats to make
what looked like alternating digging motions with the front paws, a kind of
action that might be built-in and could be modified, E. coli style, until
it became repetitive lever-pressing when applied to a lever in the right
place.

The main point, I think, is that we have two behaviors here; moving around
in the cage, and pressing a lever at a variable rate. Both behaviors are
_decreased_ when they produce food, rather than _increased_ as
reinforcement theory would predict. As food delivery rate increases
(decreasing the error) the movement around the cage decreases, and at the
same time the rate of pressing decreases. You can see these relationships
by studying the time plots.

The occurrance of food only seems to increase behavior when you focus your
attention on the lever's behavior instead of the animal's. Naturally there
are more lever presses when the animal is applying its actions to the lever
than to a rod in the floor grill or to some other part of the cage. The
actions that initially cause the lever presses, however, do not increase;
they are going on all the time. The initial increase in lever-pressings is
actually the result of the rat's locating itself so its ongoing actions
happen to depress the lever; the longer-term rise in food deliveries
results from the slowing of the search pattern so the rat stops moving away
from the food circle (which you will see happening now and then, and which
would happen more often if the organism moved around faster).

This is what I mean by focusing on the lever's behavior instead of the
animal's behavior. If you define behavior as lever-presses, you will see
the animal's actual behavior only when it happens to depress the lever;
when the animal moves the location of the same behavior away from the
lever, the behavior will seem to stop, although it's actually continuing
somewhere else. When the animal is moving the place where it acts around
the cage away from the lever, there will seem to be zero behavior, and when
the place of the behavior happens to move to the location of the lever, the
amount of behavior will seem to increase. If we could instrument the limb
positions and nose, mouth, and paw pressures so they could continue to be
measured no matter where the animal was, we would see its interactions with
its environment going on continuously. Then there would not be such an
appearance that the behavior first occurs when the lever happens to be
depressed and a bit of food gets delivered and eaten. In fact, we would see
many of the specific acts that first produce the food _dropping out_ as
variants on the initial behavior produce lever presses more often.

The appearance is that when the animal's wanderings happen to cause a bit
of food to appear, the probability of repeating the same behavior is
increased. But the same data can equally well be interpreted to mean that
the probability of changing to some other behavior is _decreased_. If you
think of behavior in terms of lever-presses measured in one specific
location, the former interpretation seems proper. But if you think of
behavior as being associated with the animal wherever it may be, the latter
becomes most appropriate.

Reorganization theory says that an animal alters its own patterns of
behavior until they produce the inputs it wants or needs. These alterations
can include the place, kind, and amount of any aspect of behavior. As
changes produce more of what is needed, the changes slow down, eventually
stopping when the controlled inputs are sufficient to turn off
reorganization. Thus like any control process, the rate or amount of the
process is largest when the error is largest, and decreases as the error
gets smaller. This is just the opposite of what reinforcement theory says:
reinforcement theory says that the process _increases_ as it produces more
of what is needed. For the behavior seen in the present model, that would
simply be a misinterpretation.

Best,

Bill P.

Richard_Marken2 · December 9, 2000, 4:51pm

[From Rick Marken (2000.12.09.0850)]

Me:

I would make it so the organism spends a bit less time in each
circle. Great start though!

Bill Powers (2000.12.09.0122 MST)--

The length of the stay depends on how much food is loaded
into each food circle during initialization.

Yes. I saw when I re-ran it. In fact, I hadn't read your whole
post before I commented on the program. It really works nicely.

The main point, I think, is that we have two behaviors here;
moving around in the cage, and pressing a lever at a variable
rate. Both behaviors are _decreased_ when they produce food,
rather than _increased_ as reinforcement theory would predict.

I think we ought to just ignore reinforcement theory. If you are
doing all this wonderful work to try to convince reinforcement
theorists that their theory is wrong, then I think you are wasting
your very precious time. I think what you have here is a start at a
nice control system model of "foraging". If there is such a thing
as reinforcement theory, and it's not just control theory with a
silly name, then I will believe it when I see the reinforcement
theory of "foraging". Otherwise, as far as I'm concerned, there
is simply no such thing as reinforcement theory.

The appearance is that when the animal's wanderings happen to
cause a bit of food to appear, the probability of repeating the
same behavior is increased. But the same data can equally well be
interpreted to mean that the probability of changing to some other
behavior is _decreased_.

Yes, I see. It's lovely!

This is just the opposite of what reinforcement theory says:
reinforcement theory says that the process _increases_ as it
produces more of what is needed.

Sorry, I don't believe this is true. Reinforcement theory either
doesn't exist (which is most likely) or it says whatever the
heck it wants to say.

Best

Rick

···

--

Richard S. Marken Phone or Fax: 310 474-0313
Life Learning Associates e-mail: marken@mindreadings.com
mindreadings.com

Chris_Cherpas · December 9, 2000, 10:58pm

[From Chris Cherpas (2000.12.09.1429 PT)]

Bill Powers (2000.12.09.0122 MST)--

I would guess it's even clearer with a pigeon (Chris?)
which naturally pecks around at all kinds of things in
an exploratory way.

It's a simplification to say that all the baseline (so-called
"operant" level) pecks in different locations just get shifted
to the key, but I think it's the right model to start with.

This is what I mean by focusing on the lever's behavior instead of the
animal's behavior. If you define behavior as lever-presses, you will see
the animal's actual behavior only when it happens to depress the lever;
when the animal moves the location of the same behavior away from the
lever, the behavior will seem to stop, although it's actually continuing
somewhere else.

This is consistent with matching-law (relativistic) versions of
the reinforcement concept, although the point you make, in itself,
is an excellent statement of what's often wrong with EAB and other
experimental psychologies.

Reorganization theory says that an animal alters its own patterns of
behavior until they produce the inputs it wants or needs. These alterations
can include the place, kind, and amount of any aspect of behavior. As
changes produce more of what is needed, the changes slow down, eventually
stopping when the controlled inputs are sufficient to turn off
reorganization. Thus like any control process, the rate or amount of the
process is largest when the error is largest, and decreases as the error
gets smaller. This is just the opposite of what reinforcement theory says:
reinforcement theory says that the process _increases_ as it produces more
of what is needed. For the behavior seen in the present model, that would
simply be a misinterpretation.

A footnote:
In the melioration theory version of reinforcement, shifting from
a baseline of a Concurrent VI1-VI3 distribution to a VI3-VI1
distribution correlates with the difference between the local
rate of food on each side of the chamber. When the distribution
stops changing (i.e., "matching" is restored), the local rates are
equal. The greater the difference, the greater the change in the
distribution.

Rick Marken (2000.12.09.0850)--

...I will believe it when I see the reinforcement theory
of "foraging". Otherwise, as far as I'm concerned, there
is simply no such thing as reinforcement theory.

Surprise, surprise: there are several (e.g., Fantino's
Delay-reduction hypothesis). The concurrent set-up has
been viewed as switching between patches; the analogy for
travel time between patches is the change-over requirement.
The first "reinforcement theories" of foraging were based
on the notion of "optimal foraging" (with the obvious
relation to microeconomic theories of maximizing value).

Rick Marken (2000.12.09.1200)--

December 9, 2000, the day the US Supreme Court ended
democracy in America.

Eh, it wasn't working anyway.

The most telling thing for me is the millions of transactions
involving credit cards, bank transfers, stock transactions, and
so forth, that occur all the time -- with very little error -- yet
"counting the votes" is portrayed as some intrinsically insurmountable
technical feat. Does anyone think that the problems of this
election will lead to putting effective technology into place?

Regards,
cc

Bruce_Gregory8 · December 9, 2000, 11:17pm

[From Bruce Gregory (2000.1209.1817)]

Chris Cherpas (2000.12.09.1429 PT)

The most telling thing for me is the millions of transactions
involving credit cards, bank transfers, stock transactions, and
so forth, that occur all the time -- with very little error -- yet
"counting the votes" is portrayed as some intrinsically insurmountable
technical feat. Does anyone think that the problems of this
election will lead to putting effective technology into place?

Effective from whose point of view? As far as the Supreme Court is
concerned the present arrangement appears perfectly effective.

BG

Richard_Marken2 · December 10, 2000, 12:07am

[From Rick Marken (2000.12.09.1600)]

Me:

...I will believe it when I see the reinforcement theory
of "foraging". Otherwise, as far as I'm concerned, there
is simply no such thing as reinforcement theory.

Chris Cherpas (2000.12.09.1429 PT)

Surprise, surprise: there are several (e.g., Fantino's
Delay-reduction hypothesis)... The first "reinforcement
theories" of foraging were based on the notion of "optimal
foraging"

Yes, I know. I also know that there are several reinforcement
models of bias change in signal detection (work done by my graduate
advisor), gambling choice, etc etc. But none of these models has
been presented on CSGNet as a working reinforcement model (probably
because these are not really working models at all; they are mainly
exercises in curve fitting using equations for stochastic processes).
The only working "reinforcement" models I've seen on CSGNet (like
Bruce Abbott's models of E. coli key pressing behavior) have turned
out to be control models. (Actually, I think the reinforcement theory
model in my "selection of consequences" java demo at http://
home.earthlink.net/~rmarken/demos.html is, indeed, a true, working
reinforcement model; but Bruce Abbott doesn't accept it as such
so I guess I have still not seen a reinforcement model in operation).

I'll believe that there is a reinforcement theory model of foraging
when I see a working reinforcement theory model of foraging that is
not a control of input model in disguise.

Best

Rick

···

--

Richard S. Marken Phone or Fax: 310 474-0313
Life Learning Associates e-mail: marken@mindreadings.com
mindreadings.com

Dick_Robertson · December 12, 2000, 1:42pm

[From Dick Robertson,2000.12.11.0740CST]

Bill Powers wrote:

[From Bill Powers (2000.12.09.0122 MST)]

The learning program looks great. I gather the irregular vertical yellow lines
correspond to lever presses, if it were a mouse, etc. But what does the shaggy
white bottom line represent?

Best, Dick R.

Bill_Powers1 · December 12, 2000, 4:57pm

[From Bill Powers (2000.12.12.0956 MST)]

Dick Robertson,2000.12.11.0740CST --

The learning program looks great. I gather the irregular vertical yellow
lines correspond to lever presses, if it were a mouse, etc. But what does

the shaggy white bottom line represent?

It's a random negative-going disturbance that tends to bring the controlled
variable downward.

Best,

Bill P.