A Word on Using Test Scores to Evaluate Teachers

September 15, 2011

As you’re probably aware, Tennessee is rolling out [pdf] its new evaluation system this year, along with new rules on tenure based on that evaluation system.  I’ve written about the use of value-added scores and data in school decisions (not just tenure and firing decisions either, which is a giant mistake and a misuse of valuable data), and how we should evaluate teachers who lack test data.  It’s worth continuing the conversation on this topic, however, and there was a very interesting (and provocative) piece written in Scientific American a while back that I’ve been meaning to get to.  Now, don’t get all offended, but the author compares using test scores to evaluate and fire teachers to eugenics.

I know.  Eugenics.  Uncomfortable.

Still, though, I think there are some incredibly interesting things about the article, even if it is a bit broad.  The title of the article is “Deselection of the Bottom 8%: Lessons from Eugenics for Modern School Reform.”  Here’s the part I want to talk about:

I do not wish impugn the statistical techniques themselves, or doubt progress in measuring what we aim to measure. However, in each moment, a refinement of the science of testing has been mistaken for readiness to apply to public policyand specific individual cases. A strong general relationship between conveniently measurable variables becomes riddled with errors when applied to individual personnel decisions. As these tools leave the lab (or the economist’s model) and enter policy reality, the uncertainty magnifies the bias and corruption that science is supposed to prevent. Whether using early IQ tests to reject immigrants at Ellis Island, or using Value-Added Measures (VAM) scores to fire or reward teachers, policymakers convinced they are using the latest microscope, are later seen holding a distorted mirror.

There is a very important point here, that cheerleaders of TVAAS and other such value-added data systems overlook: Broad data systems like these are great at identifying state-wide, system-wide, district-wide, and maybe even school-wide trends and concerns, but they’re often crap when it comes to individual teachers/students.  Not always, but certainly often enough to give us pause.  From one of my earlier posts:

TVAAS isn’t perfect.  It’s a good data system, but it has flaws.  Studies have shown [pdf] that teachers’ ratings can fluctuate wildly, even teachers that are presumed to be good ones (“Only a few studies have explored the stability of teacher effects (Ballou, 2005, Aaronson, et al (2007) and Koedel and Betts (2007). Such studies find that teacher effects are quite unstable.” (p.9))

That’s why I’m somewhat hesitant to make firing decisions with that data, especially right now, when TVAAS has demonstrable problems (and is controlled under contract by a secretive, private company to boot).  From yet another earlier post:

I hesitate, lest I be labeled repetitious, but Rule #1 is: Data should not be used in a punitive manner.

Some of the more market-oriented educational researchers out there would disagree (I think).  These are the folks clamoring for more teacher firings, relaxation of the regulations and red tape required to become a teacher, and other market-driven reforms which would ease both the entry into and exit from the profession.  Many believe (and I don’t think I’m creating a straw man here) that we can hire and fire our way to success.  That is, with enough supply out there, we can spend our time weeding out the bad teachers, and hiring enough new ones so that, eventually, we keep the good teachers and are able to sort out the chaff.  This assumes, however, that there’s an infinite teacher pipeline and that room for experience/improvement is limited. Both of those assumptions fly in the face of what I know about the profession: 1) Even if it is easier to become a teacher (and I do support alternative licensure and the reform of traditional teacher preparation), there aren’t enough folks out there, at current salary/benefit levels, to fill the void of all the mediocre-to-bad teachers we would have to fire, and 2) Experience matters — teachers can get better if you give them the support and opportunities that they need.

I still support the idea that student performance needs to be a part of evaluating teachers, that teacher development, planning, and support needs to be strongly informed by student performance, and that pay needs to be linked to, in some way (though not wholly determined by), student performance.  It’s simply that we need to be reflective on how we use data, and understand its limitations.

Overconfidence and a lack of self-reflection can kill any good thing.  To take one example, look at subprime lending.  It originally started as a way to allow more lower and middle income folks to buy homes.  Then it got out of control, people stopped being rigorous and examining their behavior and assumptions, and it ended up decimating the entire world economy.  Let’s not let the same thing happen with the use of data in our schools.

P.S. Some great resources on subprime lending and the economic crisis, I always turn to NPR’s Planet Money and This American Life (especially these three episodes) and the books All The Devils Are Here (by Bethany McLean and Joe Nocera) and The Big Short (by Michael Lewis, of Moneyball fame), both of which are excellent reads.

Note: The image above comes from this blog, written by David B. Cohen, and is a great read on using value-added data to evaluate teachers.

6 Comments leave one →
  1. September 15, 2011 9:50 am

    I’ve always felt an evaluation system was needed in place. I have non-fond memories of teachers that didn’t deserve their positions at the various institutions I attended.

    Eric Bloom

  2. Anne-Marie permalink
    September 15, 2011 12:22 pm

    Unfortunately, TN seems to be getting more and more excited about basing everything on TVAAS. I’ve been trying to get some information about a statement our new commissioner of ed. made in the Tennessean: “As new student assessments are developed and vetted by Tennessee educators and experts, we expect that next year, it will be possible for 70 percent of teachers to be evaluated by their own student-assessment results. Eventually, more than 90 percent of teachers will have such options.” Any idea about these new assessments he’s talking about? Will TCAP reach down into lower grades? Will related arts somehow be subjected to standardized testing? Frankly, as a parent, I’m very worried about his statement and really would like some clarification.

  3. September 15, 2011 9:23 pm

    (In which I vent) Teachers cannot improve if researchers are not permitted to find out what works and why. Somewhere in this maelstrom, “the powers that be” are quenching the fires of long-burning relationships with labs. Kicking researchers out means no new information in. All that can hope to be maintained in the name of “preservation of class time” is, at best, mediocrity, and at worst, an irreversible downward spiral. At some point it gets to be too late to pull up. The Holocaust reference is appropriate. There is nothing as embittering as illogical waste.

  4. SMT permalink
    September 16, 2011 9:02 am

    As a building testing coordinator at an MNPS school, I can not tell you how many countless hours are spent just coordinating and administering these tests. Most people do not even consider what must take place structurally in a building to pull off the multitude of tests that are already mandated. Counselors are spending hours counting test booklets multiple times, checking, double checking, and triple checking that every booklet is accounted for, every demographic bubble is accurate, and every last absent student has been rounded up to make up a missed test. The daily schedule on which teachers and students operate is totally hijacked for a week due to the demands of EOC and TCAP testing (not to mention the time spent throughout the year on the PLAN test, the Writing Assessment, the ACT, ELDA testing, and on and on . . .) School is quickly becoming a means to to an end, which is to test . . . not to teach and learn. And as it has been said many times before, who is profiting from all this testing? Certainly not students . . . I cannot imagine what it would take in terms of planning and human capital to test 90% of all subjects? And for what, a system of evaluating teachers that has been proven to be fallible? This is truly ludicrous.


