What's wrong with this picture.

[From Rick Marken (2010.09.02.1230)]

The reason I asked Richard Kennaway for his correlation data is because there is a bit of a controversy going on in LA these days regarding teacher performance measures. The LA Times has been running this big story over the last couple weeks on how the LAUSD (LA School District) has measures of each teacher’s “value added” (VA) but doesn’t use it. The VA measure for a teacher is the difference in average Math and English test score from the prior to the present year of students in that teachers class. So, for example, if the 30 students in a teacher’s class had an average Math score of 60 from the prior year (going into the teacher’s class) and a score of 70 at the end of the class year with the teacher then the teacher’s VA Score is +10.

The LA Times reporters got a hold of this data and decided that it should be made public so that parents and administrators could know how effective a teacher is. The teacher’s union, of course, went ballistic and tried to prevent the Times from making the data public. But the Times went ahead and published it a few days ago, along with a helpful list of the 100 (out of 6000) best teachers in terms of VA scores.

I got into this because my racquetball partner s a retired teacher and a right wing libertarian who hates unions (I guess I’m liberal enough to put up with him; actually, we rarely discuss anything other than the score – mine is typically higher). So he’s all for using the VA scores. Some of the research on VA scores was done by people at RAND and he asked if I knew one of the main guys and, sure enough, I did. So I got in touch with my RAND friend because I was interested in seeing what the reliability of the VA scores was. It seemed to me that it would be ridiculous to use the VA scores as a measure of individual teacher effectiveness if the reliability of the scores (in terms of the scores for year t being correlated with the scores for year t+1) was not quite high. So he did manage to lead me to some reliability measures for VA scores and, as I suspected, the average reliability of the scores (over 28 different studies) is .35.

According to Kennaway’s tables, with a correlation of around .35 your probability of correctly guessing the sign of a VA score at time t+1 using the person’s score at time t is about .6, just a tad better than flipping a coin and saying “heads” = “+” and “tails” = “-”. So what the times did was publish numbers for 6000 teachers that purport to be measures of each teacher’s “effectiveness” and they would have done little worse if they had published instead the last 2 digits of each teacher’s social security number.

I think the whole discussion of VA scores, which has been mainly about their validity – are they really measures of teacher effectiveness --, should just stop now. These measures are useless as measures of “anything” about the teacher. But maybe “useless” is not quite the right word. What would you call this? “Criminal”? “Stupid”?

I ask because I am planning to write a letter to the LA Times explaining that what they did by publishing the VA scores for 6000 4 and 5th grade teachers as an indication of their “effectiveness” (to help parents pick the ‘good’ teachers, I suppose), was ___________. You fill in the blank.

Best

Rick

···


Richard S. Marken PhD
rsmarken@gmail.com
www.mindreadings.com

[Martin Lewitt 2010.09.02. 1441 MDT]

Did the statistical reliability increase with number of students or

number of included years, or vary with subject and tests?

regards,

   Martin L
···

On 9/2/2010 1:27 PM, Richard Marken wrote:

[From Rick Marken (2010.09.02.1230)]

  The reason I asked Richard Kennaway for his correlation data is

because there is a bit of a controversy going on in LA these days
regarding teacher performance measures. The LA Times has been
running this big story over the last couple weeks on how the LAUSD
(LA School District) has measures of each teacher’s “value added”
(VA) but doesn’t use it. The VA measure for a teacher is the
difference in average Math and English test score from the prior
to the present year of students in that teachers class. So, for
example, if the 30 students in a teacher’s class had an average
Math score of 60 from the prior year (going into the teacher’s
class) and a score of 70 at the end of the class year with the
teacher then the teacher’s VA Score is +10.

  The LA Times reporters got a hold of this data and decided that it

should be made public so that parents and administrators could
know how effective a teacher is. The teacher’s union, of course,
went ballistic and tried to prevent the Times from making the data
public. But the Times went ahead and published it a few days ago,
along with a helpful list of the 100 (out of 6000) best teachers
in terms of VA scores.

  I got into this because my racquetball partner s a retired teacher

and a right wing libertarian who hates unions (I guess I’m liberal
enough to put up with him; actually, we rarely discuss anything
other than the score – mine is typically higher). So he’s all for
using the VA scores. Some of the research on VA scores was done by
people at RAND and he asked if I knew one of the main guys and,
sure enough, I did. So I got in touch with my RAND friend because
I was interested in seeing what the reliability of the VA scores
was. It seemed to me that it would be ridiculous to use the VA
scores as a measure of individual teacher effectiveness if the
reliability of the scores (in terms of the scores for year t being
correlated with the scores for year t+1) was not quite high. So he
did manage to lead me to some reliability measures for VA scores
and, as I suspected, the average reliability of the scores (over
28 different studies) is .35.

  According to Kennaway's tables, with a correlation of around .35

your probability of correctly guessing the sign of a VA score at
time t+1 using the person’s score at time t is about .6, just a
tad better than flipping a coin and saying “heads” = “+” and
“tails” = “-”. So what the times did was publish numbers for 6000
teachers that purport to be measures of each teacher’s
“effectiveness” and they would have done little worse if they had
published instead the last 2 digits of each teacher’s social
security number.

  I think the whole discussion of VA scores, which has been mainly

about their validity – are they really measures of teacher
effectiveness --, should just stop now. These measures are useless
as measures of “anything” about the teacher. But maybe “useless”
is not quite the right word. What would you call this? “Criminal”?
“Stupid”?

  I ask because I am planning to write a letter to the LA Times

explaining that what they did by publishing the VA scores for 6000
4 and 5th grade teachers as an indication of their “effectiveness”
(to help parents pick the ‘good’ teachers, I suppose), was
___________. You fill in the blank.

  Best



  Rick



  --

  Richard S. Marken PhD

  rsmarken@gmail.com

  [www.mindreadings.com](http://www.mindreadings.com)

[From Rick Marken (2010.09.02.1530)]

Martin Lewitt (2010.09.02. 1441 MDT)–

Did the statistical reliability increase with number of students or

number of included years, or vary with subject and tests?

The reliability of the VA scores should increase a bit as the number of sudents on which it is based increases (and the researchers say it does increase though they don’t report a quantitative relationship between reliability and sample size) . The reliability numbers I have are for studies done with a standard achievement test; the reliability may vary with type of test but it varies so much with the same test (the 28 reliability values, all obtained with the same test apparently, I have range from .15 to .5) that I think it would be hard to pick up any real difference.

Best

Rick

···

Richard S. Marken PhD
rsmarken@gmail.com
www.mindreadings.com

[Martin Lewitt (2010.09.02. 1631 MDT)]

At the 4th and 5th grade level, I would also think that whether the

teacher has them for all subjects or have different subject teachers
like in the older grades. I would think the younger students would
do better with one teacher and that would also increase the
reliability of a good teacher’s impact.

I doubt they broke that out.

thanx,

   Martin L
···

On 9/2/2010 4:27 PM, Richard Marken wrote:

[From Rick Marken (2010.09.02.1530)]

        Martin Lewitt

(2010.09.02. 1441 MDT)–

        Did the statistical reliability increase with number of

students or number of included years, or vary with subject
and tests?

      The reliability of the VA scores should increase a bit as the

number of sudents on which it is based increases (and the
researchers say it does increase though they don’t report a
quantitative relationship between reliability and sample size)
. The reliability numbers I have are for studies done with a
standard achievement test; the reliability may vary with type
of test but it varies so much with the same test (the 28
reliability values, all obtained with the same test
apparently, I have range from .15 to .5) that I think it would
be hard to pick up any real difference.

      Best



      Rick

  Richard S. Marken PhD

  rsmarken@gmail.com

  [www.mindreadings.com](http://www.mindreadings.com)

Clarification: This made it sound like I was questioning the
study. I was just expressing an issue I was curious about, but I
think probably wasn’t controlled for, since it was just analyzing
math scores.

regards,

  Martin L
···

On 9/2/2010 4:37 PM, Martin Lewitt wrote:

[Martin Lewitt (2010.09.02. 1631 MDT)]

  At the 4th and 5th grade level, I would also think that whether

the teacher has them for all subjects or have different subject
teachers like in the older grades. I would think the younger
students would do better with one teacher and that would also
increase the reliability of a good teacher’s impact.

  I doubt they broke that out.



  thanx,

     Martin L



  On 9/2/2010 4:27 PM, Richard Marken wrote:

[From Rick Marken (2010.09.02.1530)]

          Martin Lewitt

(2010.09.02. 1441 MDT)–

          Did the statistical reliability increase with number of

students or number of included years, or vary with subject
and tests?

        The reliability of the VA scores should increase a bit as

the number of sudents on which it is based increases (and
the researchers say it does increase though they don’t
report a quantitative relationship between reliability and
sample size) . The reliability numbers I have are for
studies done with a standard achievement test; the
reliability may vary with type of test but it varies so much
with the same test (the 28 reliability values, all obtained
with the same test apparently, I have range from .15 to .5)
that I think it would be hard to pick up any real
difference.

        Best



        Rick

    Richard S. Marken PhD

    rsmarken@gmail.com

    [www.mindreadings.com](http://www.mindreadings.com)

[From Rick Marken (2010.09.02.1700)]

Martin Lewitt (2010.09.02. 1631 MDT)–

At the 4th and 5th grade level, I would also think that whether the

teacher has them for all subjects or have different subject teachers
like in the older grades. I would think the younger students would
do better with one teacher and that would also increase the
reliability of a good teacher’s impact.

I doubt they broke that out.

The researchers broke out the VA data in every which way. But I believe that all the VA measures are for teachers who had all the students all day for all subjects.

I just played racquetball with my friend (split 4 games 2-2, I’m such a liberal; I could win all 4 every time but then my heart starts to bleed;-) and told him my findings. This was very disappointing to him. But the last thing he said before we parted made me understand why it was. What he said was “so teachers don’t matter”?

In fact, what the data show (both the reliability data and some of the regression analysis run at RAND) is that differences between teachers don’t make much of a difference in VA scores; differences between teachers account for on the order of 1% of the variance in VA scores. My friend’s agenda (and the agenda of both conservatives and some liberals, like Duncan who is, to my despair, my beloved Obama’s secretary of education) is obviously to attribute the apparent failings of education on the retention of “bad” teachers. My intuition was always that there are very few really bad teachers and the VA data seems to bear this out.

The data don’t at all show that teachers don’t matter (nor does it show that they do; to test this you would have to compare kids who had a teacher from t1 to t2 with kids who had no teacher from t1 to t2). What it does show is that differences between teachers matter very little in terms of VA scores, if VA scores are your measure of educational progress. I think that’s because almost all teachers are at least adequate; some, of course, are great and some are awful but there are apparently very few of either of those.

Best

Rick

···


Richard S. Marken PhD
rsmarken@gmail.com
www.mindreadings.com

[From Dick Robertson, 2010.09.03.1210CDT]

-From: Martin Lewitt mlewitt@COMCAST.NET

 Clarification:  This made it sound like I was questioning the study.  I was just expressing an issue I was curious about, but I think probably wasn't controlled for, since it was just analyzing math scores.

Yeah, and another thing: are we to understand that geography, history, English, art and whatever other teachers are to be rated on whether their students improved their math scores from one year to the next?

Best,

Dick R

···

Date: Thursday, September 2, 2010 6:39 pm

From Jim Wuwert 2010.09.03.1422EDT

This is an interesting discussion. The VA data is being used in many states. It is a large part of a teacher’s world.

What kind of standardized test was given? A nationally standardized test or a standardized test based on the achievement of students only in California or Los Angeles?

All e-mail correspondence to and from this address
is subject to the North Carolina Public Records Law,
which may result in monitoring and disclosure to
third parties, including law enforcement.
AN EQUAL OPPORTUNITY/AFFIRMATIVE ACTION EMPLOYER

[From Rick Marken (2010.00.03.1815)]

Jim Wuwert (2010.09.03.1422EDT)

This is an interesting discussion. The VA data is being used in many states. It is a large part of a teacher’s world.

What kind of standardized test was given? A nationally standardized test or a standardized test based on the achievement of students only in California or Los Angeles?

The reliability data I have for VA tests comes from school districts in Florida. The tests were different versions (depending on year) of something called the FCAT (Florida Child Achievement Test?). The reliability measures of the VA scores are for inter-year correlations using the same test. In California the tests were some kind of Math and English tests.

Much of the discussion about these VA scores (in the local press) has concerned their validity. On the face of it VA scores seem like a pretty reasonable way to measure a teacher’s effectiveness, assuming that the test involved measures what the teacher is supposed to be teaching the kids in that grade. The idea of evaluating teacher effectiveness by looking at the change in the average test scores for the same set of kids when they enter the class versus when they leave (which is what a VA score is) seems to make sense.

My point is that whether VA scores make sense to you or not, they should not be used as a measure of the performance of individual teachers because they are completely unreliable. Repeated measurement of VA for the same teacher is likely to vary wildly. Using VA scores to measure individual performance is not much better than simply having the teachers draw numbers from a hat.

Best

Rick

···


Richard S. Marken PhD
rsmarken@gmail.com
www.mindreadings.com

From Jim Wuwert (2010.09.07.0810)

[From Rick Marken (2010.00.03.1815)]

Much of the discussion about these VA scores (in the local press) has concerned their validity. On the face of it VA scores seem like a pretty reasonable way to measure a teacher’s effectiveness, assuming that the test involved measures what the teacher is supposed to be teaching the kids in that grade.

That is a big assumption. Most state tests are standardized according to how the students in that state score. But, it gives no indication about how the children in that state measure against children in a different state. I wonder what the reliability and validity is for the Florida and California tests. That may be the biggest surprise. Why didn’t the Times investigate that?

The idea of evaluating teacher effectiveness by looking at the change in the average test scores for the same set of kids when they enter the class versus when they leave (which is what a VA score is) seems to make sense. Repeated measurement of VA for the same teacher is likely to vary wildly. Using VA scores to measure individual performance is not much better than simply having the teachers draw numbers from a hat.

Assuming we have a valid and reliable test: If you look at each set of students separately and calculate results for that group, then how would it vary wildly? Growth is calculated yearly for the teacher. When you have 3 years of data, then you would have a pretty good feel for a good teacher and a not so good teacher. I am talking strictly about student growth, not proficiency. Each year some students can grow more than others based on their previous year results.

If a teacher can take student A from being a C student to an A student–in testing land this would be from a level 2 to a level 4, then wouldn’t you agree that that teacher should be rewarded heavily for doing that? Shouldn’t that data be made public? Our schools are public entities? We get performance data on other government agencies, why not with schools and individual teachers?

Additionally, if we are going to publish that data, then the government should heavily reward a teacher that does an outstanding job. We can’t say the money isn’t there because it is. That would mean we would have to stop spending money in other places. Would you agree?

Using VA scores to measure individual performance is not much better than simply having the teachers draw numbers from a hat.

All e-mail correspondence to and from this address
is subject to the North Carolina Public Records Law,
which may result in monitoring and disclosure to
third parties, including law enforcement.
AN EQUAL OPPORTUNITY/AFFIRMATIVE ACTION EMPLOYER

Thanks for keeping the thread alive, Jim.

I think we are placing too much emphasis on only one change agent role (i.e., the teacher). After all, classrooms are systems with porous boundaries that can be influenced by a multiplicity of change agents, any one of which can contribute significantly to a particular child's academic self-efficacy. Besides teachers, these agents include tutors, teacher aides, guidance counselors, coping peers, parents, and parent liaisons. In other words, it takes a community to educate a child.

More importantly, we need to appreciate the fact that the affective domain is just as important as the cognitive domain. Standardized tests do not take the affective domain into account, and that to me is their greatest weakness. When I refer to the affective domain, I refer mostly to sustained engaged states (e.g., curiosity, fascination, excitement, passion for a topic/domain/skill) within the circumplex of emotion.

What you get is what you monitor, and what you monitor is what you value. If we valued constructs such as empathy, curiosity, creativity, and imagination, would we continue to evaluate teacher performance based on standardized achievement tests?

Best,
Chad

Chad Green, PMP
Program Analyst
Loudoun County Public Schools
21000 Education Court
Ashburn, VA 20148
Voice: 571-252-1486
Fax: 571-252-1633

Jim Wuwert <JDWuwert@WSFCS.K12.NC.US> 9/7/2010 8:37 AM >>>

from Jim Wuwert (2010.09.07.0810)

[From Rick Marken (2010.00.03.1815)]

Much of the discussion about these VA scores (in the local press) has concerned their validity. On the face of it VA scores seem like a pretty reasonable way to measure a teacher's effectiveness, assuming that the test involved measures what the teacher is supposed to be teaching the kids in that grade.
That is a big assumption. Most state tests are standardized according to how the students in that state score. But, it gives no indication about how the children in that state measure against children in a different state. I wonder what the reliability and validity is for the Florida and California tests. That may be the biggest surprise. Why didn't the Times investigate that?
The idea of evaluating teacher effectiveness by looking at the change in the average test scores for the same set of kids when they enter the class versus when they leave (which is what a VA score is) seems to make sense. Repeated measurement of VA for the same teacher is likely to vary wildly. Using VA scores to measure individual performance is not much better than simply having the teachers draw numbers from a hat.
Assuming we have a valid and reliable test: If you look at each set of students separately and calculate results for that group, then how would it vary wildly? Growth is calculated yearly for the teacher. When you have 3 years of data, then you would have a pretty good feel for a good teacher and a not so good teacher. I am talking strictly about student growth, not proficiency. Each year some students can grow more than others based on their previous year results.
If a teacher can take student A from being a C student to an A student--in testing land this would be from a level 2 to a level 4, then wouldn't you agree that that teacher should be rewarded heavily for doing that? Shouldn't that data be made public? Our schools are public entities? We get performance data on other government agencies, why not with schools and individual teachers?
Additionally, if we are going to publish that data, then the government should heavily reward a teacher that does an outstanding job. We can't say the money isn't there because it is. That would mean we would have to stop spending money in other places. Would you agree?

Using VA scores to measure individual performance is not much better than simply having the teachers draw numbers from a hat.

All e-mail correspondence to and from this address
is subject to the North Carolina Public Records Law,
which may result in monitoring and disclosure to
third parties, including law enforcement.
AN EQUAL OPPORTUNITY/AFFIRMATIVE ACTION EMPLOYER

[From Rick Marken (2010.09.07.1040)]

Jim Wuwert (2010.09.07.0810) --

Rick Marken (2010.00.03.1815)

RM: Using VA scores to measure individual performance is not much better
than simply having the teachers draw numbers from a hat.

JW: Assuming we have a valid and reliable test: If you look at each set of
students separately and calculate results for that group, then how would it
vary wildly?

What are you talking about? The VA scores are unreliable, period. They
therefore can't be valid measures of anything. Calculating a teacher's
VA score is basically the same as assigning random numbers to
teachers.

JW: Growth is calculated yearly for the teacher. When you have 3
years of data,�then you would have a pretty good feel for a good teacher and
a not so good teacher.

You would "think" that you had a pretty good feel but that would be an
illusion, as compelling as the illusion that behavior is S-R in a
psychology experiments. In fact, even if VA scores are random, there
is some chance of runs of good (or bad scores) so that in a large
group of teachers you are bound to find many with runs of three or
more years of good (or bad) scores. The LA studies were done on 6000
teachers. I calculated that you would expect to find at least 260 of
these teachers with 5 year runs of good (or bad) scores just by
chance. So, no, runs of good scores should not increase you confidence
that the teacher is really the one increasing the scores when the
scores themselves are random.

JW.If a teacher can take student A from being a C student to an A student--in
testing land this would be from a level 2 to a level 4, then wouldn't you
agree that that teacher should be rewarded heavily for doing that?

No, for two reasons. First of all, because you are basing your
judgment on random numbers so you are wrong to conclude that even a
consistent yearly increase in observed student scores is due to the
teacher. Similarly, you don't know that any consistent decrease in the
scores is due to the teacher either.

Second, even if the scores were perfectly reliable _and_ valid (the
must be reliable to be valid) I would not agree that a teacher should
be rewarded based on performance. I think making rewards contingent on
behavior is a great way to screw things up. The idea of giving rewards
for desired behavior is based on reinforcement theory, which assumes
that rewarding the desired behavior will strengthen it. Control theory
shows that this approach to dealing with people sucks (in technical
terms;-) Read Powers' "Making sense of behavior" for a more lucid
discussion of the problems of making rewards contingent on desired
behaviors.

JW: Shouldn't that data be made public?

Absolutely not. The public would think these numbers actually mean
something about the teacher. The public (like most social scientists)
does not have the ability to understand what's wrong with using
unreliable group data to make predictions about the individuals in
that group. This whole drive to use VA data to evaluate individuals is
a complete and utter shuck!

LW: Our schools are public entities? We get performance data on other
government agencies, why not with schools and individual teachers?

Performance data at the group level is fine. That's why I have
presented group level data on the performance of the economy as a
function of variables like marginal tax rates. That data can be used
as the basis for making group level policies that might lead to
improvements in performance. The problem with the VA data is
specifically that it is group level data being applied to individual
teachers. The numbers at the individual level are nearly completely
useless. And, worse, misleading, since once an individual is assigned
a high or low VA score people will assume that this score is a
reflection of the person's true ability as a teacher. They don't know
that the VA score they see could just as well be the last two digits
of the teachers social security number and it would tell just as much
about their ability as a teacher.

VA scores show that the people who need to have their skills improved
are not individual teachers but the individuals who are studying them.
Those individuals -- the researchers studying VA scores -- are the
one's who are consistently (reliably) get it wrong.

Best

Rick

···

--
Richard S. Marken PhD
rsmarken@gmail.com
www.mindreadings.com

From Jim Wuwert 2010.09.07.1525

CG: I think we are placing too much emphasis on only one change agent role (i.e., the teacher). After all, classrooms are systems with porous boundaries that can be influenced by a multiplicity of change agents, any one of which can contribute significantly to a particular child’s academic self-efficacy. Besides teachers, these agents include tutors, teacher aides, guidance counselors, coping peers, parents, and parent liaisons. In other words, it takes a community to educate a child.

JW: Good point. How do we measure their impact as well? What kind of measurement stick is available to measure these agents and their impact? Or, do we chuck measurement all together and take a different approach? If so, what is that approach?

More importantly, we need to appreciate the fact that the affective domain is just as important as the cognitive domain. Standardized tests do not take the affective domain into account, and that to me is their greatest weakness. When I refer to the affective domain, I refer mostly to sustained engaged states (e.g., curiosity, fascination, excitement, passion for a topic/domain/skill) within the circumplex of emotion.

What you get is what you monitor, and what you monitor is what you value. If we valued constructs such as empathy, curiosity, creativity, and imagination, would we continue to evaluate teacher performance based on standardized achievement tests?

JW: How do you measure those constructs? They are very difficult to measure quantitatively. I think they are important and appropriate to develop in a child, but how do you measure those things? How do you measure that growth in a child?

Best,
Chad

Chad Green, PMP
Program Analyst
Loudoun County Public Schools
21000 Education Court
Ashburn, VA 20148
Voice: 571-252-1486
Fax: 571-252-1633

Jim Wuwert JDWuwert@WSFCS.K12.NC.US 9/7/2010 8:37 AM >>>

From Jim Wuwert (2010.09.07.0810)

[From Rick Marken (2010.00.03.1815)]

Much of the discussion about these VA scores (in the local press) has concerned their validity. On the face of it VA scores seem like a pretty reasonable way to measure a teacher’s effectiveness, assuming that the test involved measures what the teacher is supposed to be teaching the kids in that grade.
That is a big assumption. Most state tests are standardized according to how the students in that state score. But, it gives no indication about how the children in that state measure against children in a different state. I wonder what the reliability and validity is for the Florida and California tests. That may be the biggest surprise. Why didn’t the Times investigate that?
The idea of evaluating teacher effectiveness by looking at the change in the average test scores for the same set of kids when they enter the class versus when they leave (which is what a VA score is) seems to make sense. Repeated measurement of VA for the same teacher is likely to vary wildly. Using VA scores to measure individual performance is not much better than simply having the teachers draw numbers from a hat.
Assuming we have a valid and reliable test: If you look at each set of students separately and calculate results for that group, then how would it vary wildly? Growth is calculated yearly for the teacher. When you have 3 years of data, then you would have a pretty good feel for a good teacher and a not so good teacher. I am talking strictly about student growth, not proficiency. Each year some students can grow more than others based on their previous year results.
If a teacher can take student A from being a C student to an A student–in testing land this would be from a level 2 to a level 4, then wouldn’t you agree that that teacher should be rewarded heavily for doing that? Shouldn’t that data be made public? Our schools are public entities? We get performance data on other government agencies, why not with schools and individual teachers?
Additionally, if we are going to publish that data, then the government should heavily reward a teacher that does an outstanding job. We can’t say the money isn’t there because it is. That would mean we would have to stop spending money in other places. Would you agree?

Using VA scores to measure individual performance is not much better than simply having the teachers draw numbers from a hat.

All e-mail correspondence to and from this address
is subject to the North Carolina Public Records Law,
which may result in monitoring and disclosure to
third parties, including law enforcement.
AN EQUAL OPPORTUNITY/AFFIRMATIVE ACTION EMPLOYER

All e-mail correspondence to and from this address
is subject to the North Carolina Public Records Law,
which may result in monitoring and disclosure to
third parties, including law enforcement.
AN EQUAL OPPORTUNITY/AFFIRMATIVE ACTION EMPLOYER

From Jim Wuwert 2010.09.07.1534

[From Rick Marken (2010.09.07.1040)]

RM: What are you talking about? The VA scores are unreliable, period. They therefore can’t be valid measures of anything. Calculating a teacher’s VA score is basically the same as assigning random numbers to
teachers.

You keep referring to VA scores as completely unreliable. I think the issue is the test. Is the test reliable? If you do not have the test, then you do not have the VA score unless you base it on some other variable. The issue in education is the state tests. That should be scrutinized more than VA data.

You would “think” that you had a pretty good feel but that would be an
illusion, as compelling as the illusion that behavior is S-R in a
psychology experiments. In fact, even if VA scores are random, there
is some chance of runs of good (or bad scores) so that in a large
group of teachers you are bound to find many with runs of three or
more years of good (or bad) scores. The LA studies were done on 6000
teachers. I calculated that you would expect to find at least 260 of
these teachers with 5 year runs of good (or bad) scores just by
chance. So, no, runs of good scores should not increase you confidence
that the teacher is really the one increasing the scores when the
scores themselves are random.

Again, this goes back to the reliability and validity of the state test.

JW.If a teacher can take student A from being a C student to an A student–in
testing land this would be from a level 2 to a level 4, then wouldn’t you
agree that that teacher should be rewarded heavily for doing that?

No, for two reasons. First of all, because you are basing your
judgment on random numbers so you are wrong to conclude that even a
consistent yearly increase in observed student scores is due to the
teacher. Similarly, you don’t know that any consistent decrease in the
scores is due to the teacher either.

That is true. We do not because there are other variables in play. So, what is a better way of measuring it? This method is used because they have not come upon anything better. Not they should continue using it, but what would the solution look like? You need something in place to measure teacher performamce. What would better look like?

Second, even if the scores were perfectly reliable and valid (the
must be reliable to be valid) I would not agree that a teacher should
be rewarded based on performance. I think making rewards contingent on
behavior is a great way to screw things up. The idea of giving rewards
for desired behavior is based on reinforcement theory, which assumes
that rewarding the desired behavior will strengthen it. Control theory
shows that this approach to dealing with people sucks (in technical
terms;-) Read Powers’ “Making sense of behavior” for a more lucid
discussion of the problems of making rewards contingent on desired
behaviors.

So, how do we reward teachers for doing a good job? The seniority method does not work either. There are teachers with lots of seniority that are not very good teachers and there are newly minted teachers that are awesome at helping students. They have to get something for doing their job. They are not going to work for free.

So, are you suggesting that we pay them all 50k a year just for showing up? Many of us control heavily for money. If money wasn’t the common exchange, then it would be gold or some other form of currency exchange. How do you give more money to one teacher versus another without some type of measurement placed on performance? You need something.

If not test scores, then what? What is a better measure?

RM: Absolutely not. The public would think these numbers actually mean
something about the teacher. The public (like most social scientists)
does not have the ability to understand what’s wrong with using
unreliable group data to make predictions about the individuals in
that group. This whole drive to use VA data to evaluate individuals is
a complete and utter shuck!

The reliability of the tests is questionable. The concept of VA is good in theory. We need a better measure.

LW: Our schools are public entities? We get performance data on other
government agencies, why not with schools and individual teachers?

Performance data at the group level is fine. That’s why I have
presented group level data on the performance of the economy as a
function of variables like marginal tax rates. That data can be used
as the basis for making group level policies that might lead to
improvements in performance. The problem with the VA data is
specifically that it is group level data being applied to individual
teachers. The numbers at the individual level are nearly completely
useless. And, worse, misleading, since once an individual is assigned
a high or low VA score people will assume that this score is a
reflection of the person’s true ability as a teacher. They don’t know
that the VA score they see could just as well be the last two digits
of the teachers social security number and it would tell just as much
about their ability as a teacher.

JW: So, what is a better way of evaluating individual performance. I am not saying VA is perfect, but what is a better way of measuring it?

VA scores show that the people who need to have their skills improved
are not individual teachers but the individuals who are studying them.
Those individuals – the researchers studying VA scores – are the
one’s who are consistently (reliably) get it wrong.

JW: So, how do you suggest we measure the individual performance?

All e-mail correspondence to and from this address
is subject to the North Carolina Public Records Law,
which may result in monitoring and disclosure to
third parties, including law enforcement.
AN EQUAL OPPORTUNITY/AFFIRMATIVE ACTION EMPLOYER

[From Rick Marken (2010.09.07.1430)]

Jim Wuwert (2010.09.07.1534) --

Rick Marken (2010.09.07.1040)--

RM: Calculating a teacher's VA
score is basically the same as assigning random numbers to
teachers.

JW: You keep referring to VA scores as completely unreliable. I think the issue
is the test. Is the test reliable?

I keep referring to VA scores because that's the data that the LA
Times says should be used to evaluate individual teachers, and it's
the data they made public. You're right that the VA scores are
unreliable because the student standardized test scores are
unreliable. But so what?

JW: The issue in education is the state tests. That should be scrutinized� more than
VA data.

That might be your issue but mine was the VA tests, the scores on
which, for 6000 4 and 5th grade teachers, were published by the LA
Times so that parents could see who are the best and worst teachers.
My beef with this is that the VA scores tell you nothing about the
teaching abilities of the teachers because they are random numbers.

RM You would "think" that you had a pretty good feel but that would be an
illusion, as compelling as the illusion that behavior is S-R in a
psychology experiments.

JW: So, what is a better way of measuring it [teacher performance]?

I have no idea. The analysis of the VA scores that was done by RAND
shows that maybe 2% of the variance in these scores is due to
differences between teachers. This finding confirms my intuitions: by
and large all teachers are good teachers. There are apparently a very
small number of "bad" teachers (I had 1 in my entire pre-college
career) and a small number of super teachers (I had 2 or 3 of those).
I think teacher performance should be evaluated by peers and the
evaluation should be evaluated by the evaluated teacher to see if it
makes sense; and then the evaluated teacher can take or leave whatever
suggestions are provided. I think peers are also best at catching the
truly bad teachers. But I think it's a waste of time to try to track
these guys down; they are apparently very few and far between. I think
the idea that the problems of education result from keeping bad
teachers was made up by conservatives who don't like unions. It's like
the myth that taxes are recessionary. It's false but it works because
it seems like it should be true.

JW: You need something in place to measure teacher performamce. What would better
look like?

I disagree with the idea that you need to measure teacher performance.
99% of your teachers are apparently perfectly competent; what little
variation there is across teachers in teaching ability has virtually
no association with variation in student progress. I think you would
do a lot better for the kids if you had in place a measure of parental
performance and removed kids from the homes of poorly performing
parents;-)

JW: So, how do we reward teachers for doing a good job?

We don't. What we do is reward ALL teachers with respect and a good
salary for doing the most important job on earth.

JW: So, are you suggesting that we pay them all 50k a year just for showing up?

No, $100,000/year for showing up and doing their job in the context of
impossibly awful parents and cowardly, hostile, overpaid
administrators.

JW: Many of us control heavily for money.� If money wasn't the common exchange,
then it would be gold or some other form of currency exchange. How do you
give more money to one teacher versus another without some type
of�measurement�placed on performance? You need something.

I have no idea how to compensate people fairly for their efforts.
Leaving it to the "free market" doesn't seem to work because then you
get the anomaly of CEOs being paid 500 times more than their
employees. And forcing everyone to have the same income probably
wouldn't work either (though it's never been tried) because there are
many people who work only to make more than others. So I guess I'll go
with what we had before Raygun trashed the American economy: a
regulated free market where the regulation is done largely by highly
progressive taxation.

JW: The reliability of the tests is questionable. The concept of VA�is good in
theory. We need a better measure.

Yes, as I said, the concept of VA sounds very sensible _in theory_. It
just doesn't work in practice. When there is a measure of teacher
effectiveness that has .99 reliability then we can talk about the
merits of using the instrument as a measure of individual teacher
effectiveness. But right now we got nothing. And using that nothing
(student test scores or VA scores) as a measure of individual
achievement is a terrible thing to do to the teachers.

JW: So, what is a better way of evaluating individual performance. I am not
saying VA is perfect, but what is a better way of measuring it?

VA is not only not perfect. It is completely useless and using those
numbers to evaluate individuals is wrong because people will assume
that those numbers mean something. It's like legalized McCarthyism (so
you would expect the right to get all excited about it;-) except that
the people being smeared aren't really even the bad teachers (commies)
that their scores make them out to be. Although there is no better way
of measuring teacher performance, the aggregate data suggests that
that's not a problem. Nearly all teachers are like the people of Lake
Wobegon, above average;-)

JW: So, how do you suggest we measure the individual performance?

Individually.

Best

Rick

···

--
Richard S. Marken PhD
rsmarken@gmail.com
www.mindreadings.com

Hi Jim,

In an ideal world I'd suggest that policymakers do the following in this order:

1) Watch and reflect on this video by Dan Pink entitled "Drive: The surprising truth about what motivates us":

2) Change the unit of analysis to teacher teams rather than individuals. Create reward systems that emphasize improvements to team capacity rather than student performance. The idea is to develop team capacity that will be mirrored in the classroom system in the form of cooperative learning teams (i.e., self-similarity).

3) I'd define the capacity measures using the triple vector of Actuality, Capability, and Potentiality from Beer's Viable System Model. In keeping with Dean Kamen's notion that "free cultures get what they celebrate," I'd reward both teacher and leadership teams for making breakthrough improvements to their internal capacity.

4) As for performance, I would create reward systems at the school level that emphasized a transdisciplinary curricular approach. All change agents interacting with the school system would be encouraged to identify opportunities that the school itself as change agent could resolve for the benefit of the community. Over time, public schools would evolve into innovation incubators driven by budding social entrepreneurs.

That's just one idea for starters. :slight_smile:

BTW, I was wondering if anyone had considered adding a level beyond #11 for PCT (e.g., to account for the reorganization of systems)? I'm thinking in terms of 2nd-order cybernetics: Second-order cybernetics - Wikipedia . Food for thought.

Cheers,
Chad

Chad Green, PMP
Program Analyst
Loudoun County Public Schools
21000 Education Court
Ashburn, VA 20148
Voice: 571-252-1486
Fax: 571-252-1633

Jim Wuwert <JDWuwert@WSFCS.K12.NC.US> 9/7/2010 3:34 PM >>>

from Jim Wuwert 2010.09.07.1525

CG: I think we are placing too much emphasis on only one change agent role (i.e., the teacher). After all, classrooms are systems with porous boundaries that can be influenced by a multiplicity of change agents, any one of which can contribute significantly to a particular child's academic self-efficacy. Besides teachers, these agents include tutors, teacher aides, guidance counselors, coping peers, parents, and parent liaisons. In other words, it takes a community to educate a child.
JW: Good point. How do we measure their impact as well? What kind of measurement stick is available to measure these agents and their impact? Or, do we chuck measurement all together and take a different approach? If so, what is that approach?

More importantly, we need to appreciate the fact that the affective domain is just as important as the cognitive domain. Standardized tests do not take the affective domain into account, and that to me is their greatest weakness. When I refer to the affective domain, I refer mostly to sustained engaged states (e.g., curiosity, fascination, excitement, passion for a topic/domain/skill) within the circumplex of emotion.

What you get is what you monitor, and what you monitor is what you value. If we valued constructs such as empathy, curiosity, creativity, and imagination, would we continue to evaluate teacher performance based on standardized achievement tests?

JW: How do you measure those constructs? They are very difficult to measure quantitatively. I think they are important and appropriate to develop in a child, but how do you measure those things? How do you measure that growth in a child?

Best,
Chad

Chad Green, PMP
Program Analyst
Loudoun County Public Schools
21000 Education Court
Ashburn, VA 20148
Voice: 571-252-1486
Fax: 571-252-1633

Jim Wuwert <JDWuwert@WSFCS.K12.NC.US> 9/7/2010 8:37 AM >>>

from Jim Wuwert (2010.09.07.0810)

[From Rick Marken (2010.00.03.1815)]

Much of the discussion about these VA scores (in the local press) has concerned their validity. On the face of it VA scores seem like a pretty reasonable way to measure a teacher's effectiveness, assuming that the test involved measures what the teacher is supposed to be teaching the kids in that grade.
That is a big assumption. Most state tests are standardized according to how the students in that state score. But, it gives no indication about how the children in that state measure against children in a different state. I wonder what the reliability and validity is for the Florida and California tests. That may be the biggest surprise. Why didn't the Times investigate that?
The idea of evaluating teacher effectiveness by looking at the change in the average test scores for the same set of kids when they enter the class versus when they leave (which is what a VA score is) seems to make sense. Repeated measurement of VA for the same teacher is likely to vary wildly. Using VA scores to measure individual performance is not much better than simply having the teachers draw numbers from a hat.
Assuming we have a valid and reliable test: If you look at each set of students separately and calculate results for that group, then how would it vary wildly? Growth is calculated yearly for the teacher. When you have 3 years of data, then you would have a pretty good feel for a good teacher and a not so good teacher. I am talking strictly about student growth, not proficiency. Each year some students can grow more than others based on their previous year results.
If a teacher can take student A from being a C student to an A student--in testing land this would be from a level 2 to a level 4, then wouldn't you agree that that teacher should be rewarded heavily for doing that? Shouldn't that data be made public? Our schools are public entities? We get performance data on other government agencies, why not with schools and individual teachers?
Additionally, if we are going to publish that data, then the government should heavily reward a teacher that does an outstanding job. We can't say the money isn't there because it is. That would mean we would have to stop spending money in other places. Would you agree?

Using VA scores to measure individual performance is not much better than simply having the teachers draw numbers from a hat.

All e-mail correspondence to and from this address
is subject to the North Carolina Public Records Law,
which may result in monitoring and disclosure to
third parties, including law enforcement.
AN EQUAL OPPORTUNITY/AFFIRMATIVE ACTION EMPLOYER

All e-mail correspondence to and from this address
is subject to the North Carolina Public Records Law,
which may result in monitoring and disclosure to
third parties, including law enforcement.
AN EQUAL OPPORTUNITY/AFFIRMATIVE ACTION EMPLOYER

[From Rick Marken (2010.09.08.0930)]

Hi Chad

Welcome to CSGNet. I think you are new here, right? Anyway you're new
to me. Could you say something about who you are and what you know
about PCT? That would help me understand where you're coming from.

CG: In an ideal world I'd suggest that policymakers do the following in this order:

1) Watch and reflect on this video by Dan Pink entitled "Drive: The surprising
truth about what motivates us":
http://www.youtube.com/watch?v=u6XAPnuFjJc

It was cute, though I found it to be a tad cloying. But I think PCT
would help him a lot. PCT explains why rewards seem to work (when they
do seem to work) and why they usually don't.

CG: 2) Change the unit of analysis to teacher teams rather than individuals.
�Create reward systems that emphasize improvements to team capacity
rather than student performance.

As James Dean would say "I'm so confused". You just suggested the
video by Pink that explains why reward systems suck and now you
recommend a reward system. What gives?

3) CG:...'d reward both teacher and leadership teams for making breakthrough
improvements to their internal capacity.

Maybe I left the Pink video too soon. I must confess that I saw only
about 1/2 of it. But in the half I saw he seemed to understand that
reward systems suck. Did he come out for reward systems in the end?

4) CG: As for performance, I would create reward systems at the school
level that emphasized a transdisciplinary curricular approach.

I really must have missed something important in the Pink video.

Best

Rick

···

On Tue, Sep 7, 2010 at 2:59 PM, Chad Green <Chad.Green@lcps.org> wrote:
--
Richard S. Marken PhD
rsmarken@gmail.com
www.mindreadings.com