[Martin Taylor 920818 18:40]
Just after posting that quoted paragraph from sci.cognitive, I looked at the
next posting. I don't think HDP is like PCT at all, but judge for yourselves.
Here's the posting:
Martin
···
================
Heuristic Dynamic Programming in a Realistic Biological Context
Harry R. Erwin
erwin@trwacs.fp.trw.com
As I showed at the 1982 Animal Behavior Workshop in Guelph, Ontario,
the optimum strategy for playing a discrete game against nature involving
information collection is a simple threshold strategy. The player uses
Bayesian statistics to maintain an estimate of his probability of success,
and compares that estimate against a threshold at each decision point.
If the probability of success remains above the threshold, he continues
the game; otherwise, he quits. The threshold can be calculated by treating
the game as a problem in dynamic programming. (John Bather, Pers. Com.,
1983)
In a biological context, this strategy lends itself to implemention using
HDP. The critic network would provide the current threshold value as a local
goal value, and the action network would compare the current probability
against that value. If the current probability exceeded the threshold, the
preferred action would be to continue to collect information; otherwise it
would be to quit. Note that the critic network responds to the perceived
payoffs and risks of the game and not to the current situation. Both critic
and action networks would be prior to the motor cortex, which would then
treat both as a combined critic network and attempt to reduce fear to
nominal levels.
Current payoffs---\
O-- local goal value--------------\
Target category---/ (A) feedback \
--- \
Target condition--\ | | \
\ V | \ (D)
Self condition----->0--initial estimate --->0-current est>0
/ (B) / (C) \
Environment-------/ / \
/ decision
Information collected and processed-----/ (expressed as fear level)
\ (E)
Motor options-------------------------------------------------->0->motor
cortex
Note that there are a number of places where training would occur. Subsystem A
needs to learn how to calculate the local goal values corresponding to various
payoffs and intensities of the game (primarily defined by target category).
I suspect most species have this hard-coded in the genome. (The local goal
values are not obvious functions of the inputs!) Subsystem B can be trained
more easily--in mammals, that is part of the role of play and parental
teaching. Subsystems C and D are probably hard-coded, even in man. Subsystem C
implements logistic functions, while Subsystem D does a simple comparison.
Subsystem E probably uses fear level to affect the preference functions for
various actions used by the motor controller, although it may select a desired
fear level and output partials to the motor controller instead. (I suspect that
version is more correct, because the corresponding 2-person game can't be
handled by outputting simple fear level, and man does play the 2-person game.)
Cheers,
--
Harry Erwin
Internet: erwin@trwacs.fp.trw.com