Carol Burris: ‘Well I’ll be VAMned!’ Why using student test scores to evaluate teachers is a sham

Valerie Strauss from the Washington Post recently published an article on behalf of Carol Burris that outlines the ridiculousness of evaluating teachers by test scores. Even though the practice known as Value Added Measure (VAM) is largely unpopular with the public, few do not fully understand the negative impact it is having on our students. Please share this article with community members and parents to let them know just how harmful this practice is and why it must stop.

If by now you don’t know what VAM is, you should. It’s shorthand for value-added modeling (or value-added measurement), developed by economists as a way to determine how much “value” a teacher brings to a student’s standardized test score. These formulas are said by supporters to be able to factor out things such as a student’s intelligence, whether the student is hungry, sick or is subject to violence at home, or any other factor that could affect performance on a test beyond the teacher’s input. But assessment experts say that such formulas can’t really do that accurately and reliably. In fact, the American Statistical Association issued a report in 2014 on VAM and said: “VAMs are generally based on standardized test scores and do not directly measure potential teacher contributions toward other student outcomes.”

Use of student test scores to evaluate teachers has created some situations in schools that are, simply, ridiculous. In New York City, for an example, an art teacher explained in this post how he was evaluated on math standardized test scores and saw his evaluation rating drop from “effective” to “developing.” Why was an art teacher evaluated on math scores? There are only tests for math and literacy, so all teachers are in some way linked to the scores of those exams. (Really.) In Indian River County, Fla., an English Language Arts middle school teacher named Luke Flynt learned that his highest-scoring students hurt his evaluation because of the peculiarities of how he and his colleagues are assessed. (You can read about that here.)

Here’s a piece by educator Carol Burris showing, with data, how using “growth” scores to evaluate teachers in New York is something of a sham. Burris just retired after 15 years as principal of South Side High School in the Rockville Centre School District in New York. She was named New York’s 2013 High School Principal of the Year by the School Administrators Association of New York and the National Association of Secondary School Principals, and was tapped as the 2010 New York State Outstanding Educator by the School Administrators Association of New York State. She retired early, she said, to advocate for public education in new ways.

[The odd thing Arne Duncan told Congress]

By Carol Burris

Well I’ll be VAMned! Using growth scores to evaluate teachers is producing miracles in the state of New York!

Why, just look at teacher scores from Rochester, N.Y. In 2013, 26 percent of Rochester teachers got “ineffective” growth scores based on student performance on the state tests. Just one year later, it dropped to 4 percent! In the same year, the percent of teachers in Yonkers who got “ineffective” growth scores fell from 18 percent to 5 percent.

But wait. Where did all the “ineffective” teachers go? Looks like they are now hiding out in some of New York’s highest performing suburban districts. It has to be. The percentage of teachers with “ineffective” test growth scores in Scarsdale went from 0 percent to 19 percent, and it jumped from 0 percent to 13 percent in Roslyn on Long Island. Those “ineffectives” have even slipped into Jericho, New York, where 8 percent of those teachers are now “ineffective,” even though 81 percent of Jericho students were proficient on the Common Core math tests, far exceeding the state average of 35 percent.

Take that – -white, suburban moms. Remember when Education Secretary Arne Duncan told you that your schools aren’t quite as good as you thought?

My tongue-in-cheek account of New York teacher growth scores is not intended to imply that suburban teachers are better than urban teachers, nor that the reverse is true. And it is certainly not an argument that the teacher growth scores in 2013 were right, and the growth scores in 2014 were wrong. The above examples demonstrate the silliness of a system that produces such wild swings in ratings over the course of a year.

The New York “growth score” system (a modified VAM) is a closed model that sets each teacher against the rest. By design, it will produce about the same number of “ineffective” and “highly effective” teachers every year. All of the test scores in the state could dramatically improve, and there would still be the same percentage of “ineffective” teachers. And all of the state scores could precipitously fall and there would still be roughly the same percentage of “highly effective” teachers. The above examples simply illustrate how the deck chairs on the Titanic shift.
Answer Sheet
‘Well I’ll be VAMned!’ Why using student test scores to evaluate teachers is a sham
By Valerie Strauss July 8 at 1:08 PM

New York Gov. Andrew Cuomo. (EPA/Jason Szenes)

If by now you don’t know what VAM is, you should. It’s shorthand for value-added modeling (or value-added measurement), developed by economists as a way to determine how much “value” a teacher brings to a student’s standardized test score. These formulas are said by supporters to be able to factor out things such as a student’s intelligence, whether the student is hungry, sick or is subject to violence at home, or any other factor that could affect performance on a test beyond the teacher’s input. But assessment experts say that such formulas can’t really do that accurately and reliably. In fact, the American Statistical Association issued a report in 2014 on VAM and said: “VAMs are generally based on standardized test scores and do not directly measure potential teacher contributions toward other student outcomes.”

Still, the method has been adopted as part of teacher evaluations in most states — with support from the Obama administration — and used for high-stakes decisions about teachers’ jobs and pay. “Growth” scores also use test scores to evaluate teachers based on student test scores but don’t control for outside factors.

Use of student test scores to evaluate teachers has created some situations in schools that are, simply, ridiculous. In New York City, for an example, an art teacher explained in this post how he was evaluated on math standardized test scores and saw his evaluation rating drop from “effective” to “developing.” Why was an art teacher evaluated on math scores? There are only tests for math and literacy, so all teachers are in some way linked to the scores of those exams. (Really.) In Indian River County, Fla., an English Language Arts middle school teacher named Luke Flynt learned that his highest-scoring students hurt his evaluation because of the peculiarities of how he and his colleagues are assessed. (You can read about that here.)

Here’s a piece by educator Carol Burris showing, with data, how using “growth” scores to evaluate teachers in New York is something of a sham. Burris just retired after 15 years as principal of South Side High School in the Rockville Centre School District in New York. She was named New York’s 2013 High School Principal of the Year by the School Administrators Association of New York and the National Association of Secondary School Principals, and was tapped as the 2010 New York State Outstanding Educator by the School Administrators Association of New York State. She retired early, she said, to advocate for public education in new ways.

[The odd thing Arne Duncan told Congress]

By Carol Burris

Well I’ll be VAMned! Using growth scores to evaluate teachers is producing miracles in the state of New York!

Why, just look at teacher scores from Rochester, N.Y. In 2013, 26 percent of Rochester teachers got “ineffective” growth scores based on student performance on the state tests. Just one year later, it dropped to 4 percent! In the same year, the percent of teachers in Yonkers who got “ineffective” growth scores fell from 18 percent to 5 percent.

But wait. Where did all the “ineffective” teachers go? Looks like they are now hiding out in some of New York’s highest performing suburban districts. It has to be. The percentage of teachers with “ineffective” test growth scores in Scarsdale went from 0 percent to 19 percent, and it jumped from 0 percent to 13 percent in Roslyn on Long Island. Those “ineffectives” have even slipped into Jericho, New York, where 8 percent of those teachers are now “ineffective,” even though 81 percent of Jericho students were proficient on the Common Core math tests, far exceeding the state average of 35 percent.

Take that – -white, suburban moms. Remember when Education Secretary Arne Duncan told you that your schools aren’t quite as good as you thought?

My tongue-in-cheek account of New York teacher growth scores is not intended to imply that suburban teachers are better than urban teachers, nor that the reverse is true. And it is certainly not an argument that the teacher growth scores in 2013 were right, and the growth scores in 2014 were wrong. The above examples demonstrate the silliness of a system that produces such wild swings in ratings over the course of a year.

The New York “growth score” system (a modified VAM) is a closed model that sets each teacher against the rest. By design, it will produce about the same number of “ineffective” and “highly effective” teachers every year. All of the test scores in the state could dramatically improve, and there would still be the same percentage of “ineffective” teachers. And all of the state scores could precipitously fall and there would still be roughly the same percentage of “highly effective” teachers. The above examples simply illustrate how the deck chairs on the Titanic shift.

[Statisticians slam popular teacher evaluation method]

It is an impossible mission to create a valid and fair formula by which to rate teachers using student test scores. In both 2013 and 2014, the New York State Education Department included 20 different variables in the growth model—variables to account for factors such as poverty, special education and English language learning status. Should other factors be included, or should some be excluded? Are the included variables the right ones? Are there too many or too few? How heavily should one put one’s thumb on the scale to account for differences among students when rating their teachers? No one really knows.

Let’s look at more outcomes. Should we believe that there are, as a percentage, more than twice as many “ineffective” teachers in high-scoring Nassau and Westchester County schools than there are in New York City? There are also disparities in scores of “highly effective.” In 2014, 10 percent of the teachers in Brooklyn (Kings County) got “highly effective” state-generated growth scores. Not one teacher in Roslyn or Scarsdale did. Yet one year prior, 13 percent of Scarsdale teachers got “highly effective” scores from NYSED. Did they all stop doing their job?

This would be amusing except for the fact that there are real life consequences for teachers and principals, and thanks to Gov. Andrew Cuomo and the legislature, those consequences are now far worse. The recently passed APPR legislation gives test scores equal weight with observations in teacher and principal evaluations. Soon all teachers with “Ineffective” growth scores will be on an improvement plan or on the road to termination. If you are untenured, and you receive an “Ineffective” growth score in even one of your four probationary years, you cannot receive tenure. And that happens no matter how highly regarded you are by parents, students, colleagues or your boss.

This system will tear at the moral fabric of New York public schools. Teachers, principals and superintendents will struggle as they choose between making day to day sensible decisions in the best interest of children, and avoiding the negative consequences of VAM.

[How students with top test scores wound up hurting their teacher’s evaluation]

Think about a fourth-grade, veteran teacher in Rochester who just received an “ineffective” score. She knows that her yearly social studies project builds her students’ creative talents…..