Timing experiment (was PCT researcher who doesn't talk)

[Martin Taylor 2009.]

[From Bill Powers (2009.02.21.1617 mst)]

Martin Taylor 2009.

Yes. Since you lost the original curve, I am including it here.

It looks as if 200 to 230 milliseconds is the threshold. In tracking experiments with continuous disturbances, we measure 7 to 9 frames of delay, or 117 to 150 milliseconds, and with your new step disturbances I am getting a consistent 15 frames or 250 milliseconds -- very consistent with Fig. 1.

My hypothesis is that the perception in question is at the category level. In the diagram we developed, it is at the category level that the "answer" is matched to the "presentation". Apart from the transport lag, which seems to be about 230 msec for subject B, variation in the timing would then refer to the moment when the "intended answer" reference signal is provided to the button-selection control loop.

I have been leaning toward the same conclusion, though I don't have any justification for thinking that just making the disturbance discontinuous is enough to require category-level control. Possibly "category" is the wrong term -- the important aspect may be the introduction of discrete variables or symbolic control -- that is, control through use of tokens or symbols rather than continuous varibles. We have to do an awful of guessing here, which makes me uncomfortable.

I assume that the matching is category-level control because the acceptable answers are categorized, not because of any characteristics of the disturbance: "the important aspect may be the introduction of discrete variables or symbolic control -- that is, control through use of tokens or symbols rather than continuous variables". I have the feeling we see this the same way, though the words may be different.

I don't think this kind of error, which you correctly observe must exist, can be very great, because the data points fall remarkably close to the straight line for subject B, and after the early curve at the bottom that is probably due to different transport lags for different subject, the same is true for the combined curve.

Actually, the sum of a series of straight-line plots is still a straight-line plot, isn't it?

Remember that these actually aren't straight-line plots. The full curves run along the X-axis with y = 0 up to x = 200-250 msec, and then start increasing along a straight line. If you sum a bunch of such obtuse-angled L curves (vertically, since presumably everyone got the same set of "third-bip" times), and the threshold timings are different for different subjects, there will be a set of timings between the quickest and the slowest subject's thresholds in which only some of the subjects have non-zero y-values. For timings longer than the threshold timing of the slowest, then they all add like straight lines, and the result is a straight line. The end result is a curve connecting the X-axis with the averaged straight line, even if every one of the averaged curves has an abrupt transition.

Anyway, if the time between identification of the light and the final contact closure is constant, the straight line adjusted for movement time would simply be translated sideways on the plot.

Yes, but if the kind of error you postulate exists, as it must to some extent, it is a variable that would move each data point left or right independently. I just think that the precision with which the data points are fitted by the line argues either that the variability is small or that there must have been an unrealistically great number of trials.

I was imagining doing this experiment, and it seemed to me that between the first beep and the third, the subject has to do quite a lot of fast work. First, the initial beep says that you need to sample the state of the light in the next 500 milliseconds and leave enough time to move your hand in the right direction from where it is hovering to touch the button, then increase the pressure so the button is depressed just as the third beep occurs.

I believe Schouten asked the subject to have a different finger over each button, not to use the same hand whichever button was to be pushed. I remember us doing that in our experiments, and I suspect our reason was that Schouten had mentioned it. Using the left index finger for the left button and the right for the right button removes quite a bit of that movement time, which would show up as a variable transport lag.

And remember, with a threshold time of 200 to 250 milliseconds after onset, the first beep has to occur well before a light turns on. It's possible that the actual perceptual judgment is made more like 150 milliseconds after onset, or even sooner.

If we could figure out a way to separate out the time it takes from the perceptual judgement to the button push, I would not be at all surprised to find the threshold time for starting the rise in perceptual judgement accuracy to be in the single-digit milliseconds, and for d'=2 to be achieved by 75-100 msec. The lower bound would be how quickly a change in neural firing rate could be transmitted to whatever part of the brain makes the comparison, and that could be quite quick when the rate change is large and the number of neurons concerned is also quite large, as in this case.

I think a series of reaction-time experiments is needed here to determine how long after light-onset the actual perceptual sample is taken and what the lag time is between the first reaction to the light and the press of the button. I doubt that that lag is much different from 150 milliseconds.

Yes, you might do a parallel experiment with only one light, but it would be harder to interpret. You would still have the three bips and the two buttons, but the buttons would be labelled "Yes" and "No" rather than "left" and "right'. If instead you just asked the subject to respond by pushing one button as quickly as possible when the light comes on, you would be looking rather high on the detection curve (straight line), rather than finding the first moment at which the subject is getting some information from the light, which is the moment that determines the transport lag. Subjectively, in an auditory experiment, one is never very sure whether one heard a tone or not until the signal level gets above something like d' = 2. I would not be surprised to find the same to be true of visual detections. I think most subjects would not begin the button push before the detectability had reached about d' = 2 or a bit more, because not until then would they decide they had seen the light come on.

If there were much variation in the button-push times gathered into the data for one point, I think that would show up as quite visible left-right deviations from the line.

As I said above, however, if the button-pushing move takes a fairly constant amount of time, the shape of the curve would not be changed.

Yes. Your original comment seemed to suggest that "fairly constant" should be characterized as "fairly variable" :slight_smile: Hence my comment above.

If the curve is indeed the initial rise of an exponential-like approach to asymptote, that asymptote must be a very long way beyond the range investigated in the data.

I don't think so. Remember that the quantity actually plotted is not the perceptual signal, but a very compressed function of its magnitude relative to the noise level. With d'^2 = 9, you say the probability of an error is 3 in 1000, which is pretty close to asymptote on a linear scale and is just above the highest recorded point at d^2 = 8. If you converted the y axis to probability of error ( minus 0.5, I assume), I think you would see the 100%-right line being approached pretty fast, and the signal itself could curve quite sharply.

Yes, this is precisely why we never plot simple error rates on a linear scale. They become uninformative at high percentage correct. The difference between 96% and 99% correct is a big difference whereas the difference between 76% and 79% is pretty trivial. Just try using a voice recognition system to write text, with 96% accuracy, and one with 99% accuracy, and I think you will see that it is a big difference (of course, 99% isn't really good enough, and going to 99.6% is an even bigger improvement)! If we do plot error rates, we use z-scores instead, which equalize the variance at every point along the curve. The d'^2 plot does the same thing. It makes equally distinguishable difference equally spaced on the y axis.

I suppose we need to compute those numbers -- you know what the functions are, why not just try it and see?

When the noise is assumed to be Gaussian, d' is just the number of standard deviations between the mean value with signal and the mean value for noise alone. You can read it off a plot of the normal distribution. When the noise isn't Gaussian, the number is translated into the equivalent, which is possible for a remarkable theorem (included in my handwritten Bayesian notes) that shows the probability of a correct answer in a forced choice experiment is independent of all assumptions about the signal and noise distributions -- or more correctly, it shows that the forced choice probability of a correct response can be equated to the d' measure in a simple detection experiment with Gaussian noise, regardless of whether the noise is Gaussian in the forced choice experiment.

I could do better than that. A long time ago, a mathematician and I computed the d' values for an UNforced choice experiment (one in which the subject could say "left", "right" or "abstain from choosing". We published a big table in the form of an internal report referenced in a published paper (A table of d' for a model of the unforced choice experiment. Percep. Mot. Skills, 1966, 22, 282 (with W.C.G.Fraser) ). I don't think anyone had considered analyzing this situation before, but it reflects most real world situations better than does a forced choice experiment. I don't think we could use "abstain" in a timing study, though :slight_smile:


[Martin Taylor 2009.]

[From Bill Powers (2009.02.20.1946 MST)]
about the experiment in which the response moment is timed by an
auditory “bip” sequence.

…What’s interesting to me is that this experiment provides a possible
probe into the levels of organization, quite similar to Marken’s
ingenious tasks that show different temporal characteristics for
perceptions of different levels of variables. The idea of forcing an
observation to take place at a specified time is an clever way of
sneaking in between the time at which an input is perceived and the
time at which something is done about it. Time rather than a wire
cutter is used to break the loop and isolate the input effect before
feedback can modify it. I think this is a legitimate technique.

It occurs to me that this technique might be used to probe the sequence
of events that occur during the recognition of words, and that in turn
might say something about levels of organization for that kind of
perception. Thirty years ago, Rayner and Posnanski did a study using a
method that I don’t want to try analyzing from a PCT perspective, in
part because I don’t know enough of the details (it might or might not
be valid – I don’t much care either way), and because the resulting
data were little better than suggestive, though we used them in our
Psychology of Reading and a paper or two. The results suggested that
the process of word recognition is something like this (the time
markers I mention are approximate, and from memory): by 20 msec the
overall shape of the word is seen, by 50 msec, the end letters are
recognized, by 100 msec the middle letters (of 5 letter words such as
“horse”), and by 150 msec the order of the letters. It might be
possible to investigate this further, using what we might call “The
Method of Bips”.

Here’s a skeleton of how it might go.

Subject’s answer: two buttons, labelled “Yes” and “No”. The subject has
to push one of them exactly when the third of a quick sequence of three
auditory bips occurs.

The Presentation: A picture of, say, a horse – or the word HORSE (all
caps) is shown for a long enough time that the subject has no problem
identifying it. Then a string of lowercase letters is shown under the
picture or the capitalized word. At the third bip, the subject has to
answer “Yes” or “No” according to whether the string of letters is a
word properly labelling the picture or is the same word as the
capitalized precursor. With a long bip-delay the subject should get
near 100% no matter what the string, and with a sufficiently short
bip-delay the subject should get no better than 50% (assuming that 50%
of the time the string does correspond to a “yes” answer. The
interesting question is what happens with different kinds of string at
intermediate bip-delays.

Let’s suppose the correct answer is “horse”. Different kinds of strings
that should get a “no” answer range from “radio” (meaningful,
differently shaped, and no letters in common at any position), “andix”
(meaningless, different shape, no letters in common), “banvx”
(Meaningless, same shape, no letters in common), “hanve” (Meaningless,
same initial and final letters, other letters different), “hrsoe”
(Meaningless, anagram, same initial and final letters). These are the
categories Rayner and Posnanski mentioned. The question is always how
discriminable the different kinds of wrong answer are from “horse” at
different bip-delays.

If Rayner and Posnanski were right, we should find a family of straight
lines of d’^2 versus bip-delay like the one Schouten found for the
switching on of the lights. But how would these lines relate to one
another? If we do find such lines, are they parallel but with
differently delayed thresholds (which would suggest a sequential series
of perceptual levels, each getting completed results from the
predecessor), do they have the same threshold bip-delay but different
slopes (suggesting parallel processing of the different percepts), or
do they get progressively steeper with longer thresholds for the more
nearly correct strings (I’m not sure what that would imply), or would
they be somewhere between these possibilities? The R&P results,
even if precise, would determine only a single point on each line, at a
reasonably high value of d’, probably between 2 and 3.


It would be a big experiment, and a big commitment by each subject, but
might be worth doing by some student with an interest in
psycholinguistics and PCT. It’s not one I would contemplate doing