# SIMULATION: Session 1

SIMULATION

Session 1: Variables, Functions, and Systems

For purposes of simulation, the world does not consist of
objects, but of variables and functions. Let's talk about that
for a while.

VARIABLES

A variable is something that can vary along a scale. We use names
or temporary symbols like x and y to distinguish one scale from
another, like size, position, rotation rate, force, sweetness,
and sadness. The _identity_ of the variable is the name of the
scale. The _value_ of the variable is its position on the scale.
We use the name of the scale, commonly, to refer both to the
identity of the scale and to the present value of the variable on
that scale. Thus the variable F (for "force") refers to some
number that tells us how much force exists, and at the same time
it reminds us that the number 5 belongs on the force scale, and
not, for example, on the sweetness scale. When we say that F = 5,
we are saying that the magnitude of the variable is 5, and also
that it is the variable named F, for Force, that we are talking

The important difference from common usage is that when we name a
scale, such as sadness, we still have to specify a position on
that scale. So we don't just say a person is "sad." We establish
a maximum value of the variable and a minimum value -- say, 100
and 0 -- and then give a number that says where on the scale the
variable is right now: sadness = 37% of the maximum. If someone
says he is sad, and that the amount of the sadness is 2% of the
maximum value, we don't have to feel very sorry for him, not as
sorry as if he said he was 98% sad. Everybody is sad; it's just
that most people are 0% sad, while others are other amounts of

When we deal with these scales without the concept of variable
amounts, we make them, in effect, into binary scales. The wind is
blowing or it is not. We are sad or we are not. Something is big
or it is small; it is spinning or stationary; sweet or bland. We
turn variables into things and states, as if sadness were a state
that exists or doesn't exist, or as if the only positions that
something could take were near and far.

The reason we deal with variables and not things or states is
that in simulations we're trying to reproduce the _behavior_ of
something: how it acts through time. In the real world, there are
no instantaneous events; all processes, no matter how rapidly
they occur, take time to happen. Every event has a beginning, a
middle, and an end, if you look at it on a fine enough time
scale. In simulations, we want to reproduce not just the
"occurrance" of events, but the particular way in which variables
change during the event. We need to say not just whether
something is near or far, but where it is at every instant.
This means that all processes that we simulate (other than those
on the subatomic scale) must be defined in terms of continuous
changes in the values of variables. "Continuous" means that when
a variable changes from one value to another, it has to pass
through all the intermediate values. When we "run" a simulation,
the variables we're dealing with all obey this rule: to get from
one value to another, they have to pass through all the
intervening values. Sometimes those intervening values become
important.

To represent the world in terms of variables is to recognize its
fundamentally continuous nature.

FUNCTIONS

Another basic aspect of simulations is the way we represent the
effects of some variables on other variables. It's a basic
assumption in simulation that the laws of nature are regular and
reliable. Even if the relationships among variables change, they
change for reasons that can be represented as the lawful and
regular effects of other variables. We represent regular and
lawful effects by saying that an effect-variable is a _function_
of a set of causal variables. More on that shortly.

In physics, at least when I studied it, the objects and things in
nature are treated as if they have _properties_. The properties
of something consist of the way some variables associated with it
change when other variables are changed. For example, when the
temperature of a given weight of water is raised toward the
boiling point (temperature is one variable), the volume of water
increases in a certain way (volume is another variable). That
relation of temperature to volume is a property of water. Water
has many properties, each property being a relationship among the
variables that, together, make up what we call water.

Directionality of cause and effect

One famous property of matter is stated in the formula, F = MA.
If a force (a particular value of one variable on a scale called
F) is applied to a piece of matter, the matter's acceleration (a
particular value of a second variable on its own scale, called A)
will be proportional to the force. Both force and acceleration
are observable variables. The constant of proportionality, M, is
not directly observable in the same way. It represents the way in
which a certain amount of matter converts forces into
accelerations. Calling this constant of proportionality "mass"
makes it seem like something having the same kind of measurable
reality that the force and the acceleration have, but
measurements of mass always require us to observe other
variables, like force and acceleration, and _define_ the mass in
terms of them: M = F/A. Mass is not a variable in the same sense
that force and acceleration are; it's a property of matter. For
any given hunk of matter, if we measure both F and A under many
different conditions, we find that F/A is always the same. That's why we
think of it as a property of that piece of matter -- its
mass. And since it's a very reliable property that doesn't
change, we represent it not as a variable, but as a constant: a
variable with only one fixed value.

By applying a force, we can produce an acceleration. But by
applying an acceleration, can we produce a force? This is a trick
question, because if you ask how you produce an acceleration of
an object, you find that you must have applied a force to do so,
the very force you're trying to produce by producing the
acceleration. We can apply a force by stretching a spring or
heating a container of gas, but we can't apply an acceleration
except by applying a force. In terms of causation, therefore, we
should write A = F/M (putting the effect on the left and the
cause on the right, by convention).

This says that "F = MA" is incorrect in one regard: it implies
that we can produce a force by (magically) making an acceleration
appear. Physicists typically ignore this directionality in
equations like this. In fact, they treat F, M, and A as if they
were all equivalently "real", and F = MA as meaning exactly the
same thing as A = F/M, just because algebra lets you transform
the equations that way. Algebra can't reveal causality.

The expression F = MA really means, for the simulator of systems,
that if we have an acceleration A and a mass M, the applied force
_must have been_ F. Writing the equation in this way makes it
into a deduction of a prior cause, F, from observations of its
effect, A. Writing it the other way, A = F/M, is a _prediction_
of an effect, A, from observation of the cause, F. In both cases
we are describing _the same unidirectional relationship_, in
which the causal arrow runs only from F to A, not the other way
around, despite the fact that according to convention, F = MA
implies that A is the cause and F the effect.

Another example closer to home may help. In the eye, there are
photoreceptors that respond with a neural signal of a certain
frequency S when light of a certain intensity I is absorbed (the
real relation is more complex, but the point is not affected by
that). If the light has an intensity I and the neural signal
representing light has a frequency S, we can say that

S = k*I,
where the asterisk means "times", and k is a constant number set
to make all measurements of S and I consistent with each other.

By artificially creating a light intensity with a magnitude I,
the equaations says, we can produce a neural signal S with a
magnitude equal to k*I.

Clearly, algebra lets us write, just as truthfully as far as
algebra is concerned,

I = S/k.
By the same interpretation, this says that if we artificially
induce a neural signal S, the result will be absorption of light
of intensity I. But that is clearly not what would happen.
Creating light causes a signal to appear, but creating a signal
does not cause light to appear. This is a one-way relationship.

In simulating systems, we must pay attention to the
directionality of the relationship between one variable and
another. In cases where there is true bidirectionality, as in the
relation between the positions of the two ends of a lever, we
represent the relation by two arrows, one going in each
direction. If side B of a lever moves 3 times as much as side A,
then side A moves 1/3 as much as side B. Since we can in fact
push on either end of the lever to move the other end, we can
say, using Y for distance moved,

Yb = 3*Ya, and

Ya = (1/3)*Yb

The direction of causality (that is, which equation to use) has
to be determined in some other way, since this is a truly
bidirectional relationship.

The point I'm making here is a very important one that has been
known to be overlooked. One mustn't just blindly manipulate
mathematical relationships and interpret the results according to
the convention that the effect is always the variable on the left
side of the equation. It's necessary to look at the physical
situation and think through the question of what is actually
causing what.

To help keep causation straight, all relationships represented by
arrows in a diagram of a simulation are taken to be
unidirectional in the direction of the arrow. Any truly
bidirectional relationship is shown by two distinct arrows, one
running in each direction between two variables.

Mathematical functions

As I learned the term 50 years ago, a _function_ is a
mathematical expression describing how ONE variable depends on a
SET of other variables (and possibly itself). This is in contrast
to a _relation_, in which a SET of variables depends on a SET of
other variables. A relation can always be represented as a
collection of functions, each function producing one variable
that is a particular function of all the "input" variables, as
many functions existing as there are "output" variables.

The simplest function is one with a single input variable and a
single output variable, like A = F/M (the causal arrow running to
the left). The two variables are A (the output) and F (the
input), where we say "output" to designate the head end of the
causal arrow.
Suppose that we have applied different amounts of F and have
measured A each time, and that we've written the results in a
list:

F --> A

5 15
8 24
1 3
17 51
55 165
22 66
etc.

Now if we want to predict A from a known amount of F, all we have
to do is haul out the list, look up the value of F, and read off
the value of A. This is called a lookup table. Of course such
tables are usually sorted so the left-hand entries increase from
smallest to largest; this makes the input entry easier to find,
and also allows us to interpolate when there's no value of the
input that exactly matches the one we want to look up. The
simulation program Vensim has a "lookup" function for exactly
this purpose.

The great advantage of the lookup table as a way to describe a
relationship between two variables is that there is no need for
any mathematics. One could put together an entire simulation in
which all relationships between variables were represented by
lookup tables containing only observed values of variables. Of
course this would get awkward when one variable depended on two,
three, or more other variables; the lookup tables would become
two, three, or higher-dimensional tables requiring large amounts
of storage space. But with disk space costing ten cents per
megabyte or less, that's no big problem any more.

This concept of a simulation without mathematics may surprise
some people. There's a tendency to think that the mathematical
forms that show up in simulations are important in themselves, as
if there were hidden meanings in the variables and their square
roots and cosines and reciprocals and so on. But there are no
hidden meanings. The only purpose of the mathematical forms is to
save us the trouble of creating and then referring to large
numbers of very extensive lookup tables. The mathematical form F
= MA can replace a whole long lookup table like the one above,
provided that we find M ( = F/A) to be the same for every entry
(is it?).

If you don't like lookup tables, mathematical forms are a way to
APPROXIMATE the actual relationship. In the table above, it
happens that for each value of F, the measured value of
acceleration is 3*F. So by using the mathematical form A = 3*F,
we can generate the right value of A for any value of F, without
needing the lookup table and extrapolating between its entries.
In this case the mathematical form gives the exact relationship,
but this will not generally be true. The measurements will be somewhat
uncertain, and nature seldom cooperates by relating its
variables in ways that conform _exactly_ to a simple mathematical
form.

The upshot is that most real simulations will not use more than
one or two lookup tables, and will definitely rely on
mathematical forms to represent real relations among variables.
But going through this side-issue has been useful, I hope, in
conveying what these mathematical forms are FOR. Their only
purpose is to allow one variable to be evaluated given the values
of all the other variables on which it directly depends. The
mathematical forms themselves are only a means of doing this
easily and with reasonable accuracy.

When we use a mathematical form to represent the way one variable
is affected by one or more others, we call this form a
_function_, and say that the one variable is a specific function
of the other variables. In the equation z = 2x + 3y - 27, z is a
function of x and y, with the specific function being 2x + 3y -
27. It is possible that some other set of variables might be
related in a way that is the same except for the names of the
variables: we might find that p = 2r + 3s - 27. In that case we
would say that p is a function of r and s, and z is the _same
function_ of x and y. This can make us suspect that there is some
deeper level at which the variables named x,y, and z are
connected to those named p,r, and s -- the suspicion usually
being incorrect. The fact that gravitational attraction and sound
intensity both follow an inverse-square function of distance does
NOT mean that gravity is like sound.

SYSTEMS

The word "system" has been tossed around quite a lot. It's really
just a convenient label for the collection of variables and
functions we have decided to study. We obviously can't study the
entire universe, so we bite off a small chunk of it, making note
of the connections we broke in doing so, and try to understand
how all the variables related by functions in this chunk will
behave when left to themselves. The broken connections may or may
not be important; usually we find they are important when
ignoring them results in a simulation that behaves differently
from the real world.

To set up a system for simulation, we have to go through it and
pick out all the variables that matter. When we first start, we
probably don't know all the variables that matter, but we can
discover them. The way we discover them is that we start building
the diagram of the system, and find that there are variables
unaccounted for -- that is, they aren't functions of other
variables, they haven't yet been set up as arbitrary constants to
be adjusted by the user of the simulation, and they aren't
outputs of the system. The variables that are either arbitrary
constants or outputs form the boundaries of the system, its
connections to the world it was separated from. The internal
variables that haven't been accounted for simply have to be accounted for.
A variable is either an input to the whole system,
an output from the whole system, or some function of other
variables in the system. There are no variables that are "just
there."

This doesn't take care of _omitted_ variables -- variables we
didn't realize are there in the real system and are important. I
don't know of any formula that will bring them to light. Without
them, the simulation won't behave like the real system, so you
have to find them. If I could tell you a formula for doing this,
I would be richer than Bill Gates, and I wouldn't tell you.

Aside from the input and output boundary variables, the other
variables in the system can be connected in an uncountable
(literally) number of ways. First, any variable can be a function
of one or more other system variables, up to the total number of
variables. And second, the forms of the functions can be anything
imagineable that can be accomplished by the physical system in
question. I believe that this takes us beyond the Aleph-null
degree of infinity.

With a menu of systems that is transfinite in size, there is
obviously no point in studying all possible types of systems. I
don't believe that anybody knows how to enumerate or classify
them, much less characterize them. The only useful approach is to
start with the real system, and find a representation of it that
does it justice. This means that in simulating real systems, we
inevitably must join theory with experiment.

The basic reason for simulating real systems is that we can't
understand the real systems in one whole chunk, unless they're so
simple as to be trivial. But we can take one variable, and by
hook or by crook identify the other system variables on which it
depends. We can write a function that completely (as far as we
know) accounts for the variable in terms of the values and
changes in value of other variables.

This amounts to identifying a subsystem within the whole system
that could be chopped out of the whole system without changing
the way the output variable depends on the input variables. In
fact, its boundaries are the very input and output variables we
have identified. If the rest of the system has ANY other way of
influencing the output variable, we simply add that way to the
list of input variables, and keep doing this until this subsystem
is connected to the rest of the system ONLY through its input
variables and its single output variable.

Having accounted for one system variable as the output of a
subsystem, we then proceed to account for ALL the other system
variables in the same way, except the input and output variables
of the whole system. This results in an analysis of the whole
system into a collection of interacting subsystems, each
subsystem being a function that converts a set of input variables
into the state of one output variable.
If we have done this analysis of the real system correctly, we
should be able to represent each subsystem as a block in a
computer simulation, connect the blocks so that the outputs of
some blocks become inputs to others, set up the input boundary
conditions, run the simulation, and see that the behavior of ALL
THE REAL SYSTEM VARIABLES matches the behavior of ALL THE
SIMULATED SYSTEM VARIABLES. Not only the input and output
variables should match; all the other internal variables in the
real system should match their simulated counterparts, too.

The check against the real system variables (as many as possible)
is a vital part of simulating real systems; it's the reality
check. But that's not the main thing we get out of a simulation.
The main result is that we see the system doing things we never
could have predicted using unaided brain power. If we're lucky,
we also come to understand _how_ the behavior of the system
emerges from its organization, which means we come to
_understand_ the system in ways that were never open to us
before.

Thus endeth the lesson. This has been just a very rapid run-
through of the basic ideas behind simulation. The detailed how-
to-do it comes next, to the extent that I and any others who care
to chime in can convey it. This has been very much my own slant
on the principles of modeling through simulation, and others may
see things differently. While I work up the start of the
applications phases of these lessons, maybe we can talk about the
concepts as described so far.

Bill Powers
[Copyright 1998, by William T. Powers]

[From Bruce Abbott (980904.1250 EST)]

Nice piece, Bill. Clear, effective, and at just the right level.

Regards,

Bruce

[From Tim Carey (9809051805)]

From: Bill Powers <powers_w@FRONTIER.NET>

Great post Bill, thanks.

Tim

[From Bruce Gregory 9980908.1442 EDT)]

S = k*I,
where the asterisk means "times", and k is a constant number set
to make all measurements of S and I consistent with each other.

I have no idea what the description of k means.

Bruce Gregory

[From Bill Powers (980908.1503 MDT)]

Bruce Gregory 9980908.1442 EDT)--

S = k*I,
where the asterisk means "times", and k is a constant number set
to make all measurements of S and I consistent with each other.

I have no idea what the description of k means.

Suppose you observe the following pairs of values of S and I:

I S

7 21.991
2 6.283
5 15.708
3 9.425
4 12.566
9 28.274

What single value of k in the equation S = k*I will make the equation
consistent with all six pairs of numbers? That's all I meant.

Best,

Bill P.