© 2025 St. Louis Public Radio
Play Live Radio
Next Up:
0:00
0:00
0:00 0:00
Available On Air Stations

Commentary: 'Scientifically valid and reliable'

This article first appeared in the St. Louis Beacon, July 6, 2011 - The National Education Association, a 3.2 million member teachers' union, spent Independence Day voting in favor of a new assessment policy that, according to a report in the New York Times, "calls for teacher practice, teacher collaboration with schools and student learning to be used in teacher evaluation."

Great, that sounds fine.

However, when it comes to using students' standardized test scores to evaluate teacher performance, the policy says that these tests need to be "developmentally appropriate, [and] scientifically valid and reliable for the purpose of measuring both student learning and a teacher's performance."

Dull as they are, these words are not as matter-of-fact as they seem.

At first, the idea sounds like a tasty bone thrown to the data-craving wolves of education reform, a way to make teachers seem not entirely averse to having their own professional evaluations tied to student performance on tests. (Not that any teacher on earth doesn't, on her own, tie her professional self-assessment to some degree to the academic achievement of her students. But never mind that for now.)

But the language of the policy in the paragraph above warrants a little probing. Sure, these words say, we'll be happy to use tests in teacher evaluations, as long as those tests are developmentally appropriate, scientifically valid and reliable... ."

Meaning?

That phrase -- "scientifically valid and reliable" -- is a loaded one. You've got to be a certain kind of person in a certain kind of business to know what "valid" and "reliable" mean in this context. Researchers in the hard (and some social) sciences (biology, physics, chemistry and so forth) use the word valid to mean that a test or experiment actually measures what it purports to be measuring. And they use the word reliable to mean that any given test or experiment, if replicated in another context, by another researcher, on another bunch of materials or people, would obtain the same -- the exact same -- results.

There exist today no standardized tests, none, that any kind of scientist would deem valid and reliable -- in the strict senses of those words -- for the purpose of measuring both student learning and a teacher's performance over time. Moreover, I would argue that the very principles of science, if we take them seriously, insist that it would be awfully difficult, not to say impossible, for someone to come up with a standardized way of testing both of these complex, messy, deeply human and totally interactive processes unfolding over time.

As I have written in an earlier column, it's not that teacher assessment cannot be done, or cannot be imagined in light of students' experiences and achievements; it's that standardized assessment of the activities, characters, processes, and interactions we are talking about here is, I believe, epistemologically impossible.

Exactly three years ago, the American Educational Research Association (AERA) came up with a definition of "scientifically based research" grounded in scientific standards and principles. Here it is, pared to the core:

"The term "principles of scientific research" means the use of rigorous, systematic, and objective methodologies to obtain reliable and valid knowledge. The examination of causal questions requires experimental designs using random assignment or quasi-experimental or other designs that substantially reduce plausible competing explanations for the obtained results. The term "scientifically based research" includes basic research, applied research, and evaluation research in which the rationale, design, and interpretation are developed in accordance with the scientific principles laid out above."

This whole teacher-as-cog-in-the-education-wheel conversation is about a particular "causal question:" "Did Teacher X make Students A, B, C, D, & F learn what we told him they needed to learn?"

Think of it this way: It may be possible to blame the wobbly wheel in the showroom SUV on Worker X's failure, back on the assembly line, to properly screw a bolt. A researcher could probably rule out most of the other explanations for such a malfunction.

But when it comes to kids and teachers in schools, and evaluating good and bad teaching, and the learning (or lack of learning) that takes place among students as such learning relates to that teaching, well, there are always "plausible competing explanations" for the educational outcomes we notice.

Why is it so hard to stop thinking of teachers as line workers, schools as factories, and young human beings as commodities being prepared for market?

Inda Schaenen is a writer and teacher in St. Louis.