“Research Says…” - Why quality research matters for education

Learn more about Mark Dynarski.
Mark Dynarski
Advisor, Education
George W. Bush Institute

As debates heat up over Common Core standards, the renewal of the No Child Left Behind Act and state testing policies, we often hear people citing...

As debates heat up over Common Core standards, the renewal of the No Child Left Behind Act and state testing policies, we often hear people citing research to buttress their point of view. In fact, there are so many competing research findings that is hard for educators, parents and taxpayers to know what to believe and what not to believe. Conflicting reports also make it hard for policymakers to know how to construct the best policies.

The best way to make sense of the research wars, which is what it can seem like, is to look at what standards are guiding the work. Standards are a part of our everyday life, whether we stop to think about them or not. When we send an email, text a message, or use an appliance with a “UL” on its label, we are using standards that are useful to us.


The same is true with research. Standards apply there, too. And, when adhered to, they benefit us.

In the world of education, the past decade has seen an explosion of research that meets high standards. The growth is due to the leadership of the federal Institute of Education Sciences, which was created in 2002. And it has led to improvements in areas like reading, where we now know the best ways to teach reading comprehension to middle school students. It also has led to such improvements as understanding the differences math textbooks have had on student math skills.

But there are no standards for how to use research.

Debates about important policy issues often cite research evidence. Many sentences begin with “Research shows that….” But how authors select evidence and how they interpret it varies widely. Sometimes the question being asked and the evidence being put forward just don’t match.

Here’s an example of how research standards matter. Some commenters on the reauthorization of the Elementary and Secondary Education Act claim tests harm education.

If that is so, what kinds of evidence might be put forward to back up that claim? Three approaches come to mind, but not all meet research standards.

Anecdotal evidence
Ask friends and neighbors, check blog posts, and read news articles to see whether people are talking and writing about tests harming education.

Survey evidence
Send a survey to a group of educators and parents asking questions about whether tests are harming education. Or send a survey to a random sample of educators and parents.

Experimental evidence
Divide a sample of school districts or states into equal groups and assign one group to carry out tests, assign the other group to go about its usual routines without tests, and compare differences over time between the two groups on outcomes such as teacher retention, teacher morale, parent support, student test scores, high school graduation and college going. This approach makes sense only if states or districts are not already testing. We want the so-called “control” group not to test the same amount as the treatment group.

From the scientist’s perspective, only the third approach provides evidence of whether tests are harming education. “Tests cause harm” is a statement that tests cause harm. From a formal scientific viewpoint, that’s a strong statement that requires strong evidence to support it.

Anecdotes don’t provide that kind of evidence. They are not systematic, cannot be counted, cannot be analyzed using quantitative methods, and may arise from sources that are unreliable. The latter can include people reporting other people have said something that they actually did not say. Even having many anecdotes does not prove tests harm education.  They only show that the people you asked thought tests harm education.

Surveys of educators and parents don’t provide scientific evidence, either. They are a more sophisticated tool, but they are not more powerful evidence of whether tests cause harm.

Suppose in the survey that many educators agree with the statement that “The use of tests has harmed education.” What is this really telling us? It is measuring whether people think tests caused harm, but this is just a dressed-up version of an anecdote. How do we know what is happening now wasn’t going to happen anyway?

The logic of the first two approaches comes down to what logicians call a “post hoc” fallacy:

Education has changed in ways some observers think is undesirable
Tests have become more prominent
Therefore, tests led to the changes

An analogy shows why this argument is a fallacy. Last week I did not wear the shirt of my favorite team. The team lost. This week I wore the shirt. The team won.  Therefore, the shirt contributed to victory.

Well, no. This argument works well for creating superstitions. It’s not a sound basis for policy.

The third approach meets research standards because it produces causal evidence that testing has or has not caused harm. The point of standards such as those used by the What Works Clearinghouse, the Best Evidence Encyclopedia, the Campbell Collaboration, and the Coalition for Evidence-Based Policy is to winnow out studies that do not provide causal evidence from ones that do.

The resulting syntheses help policymakers see the preponderance of evidence. If 10 studies that meet standards all point in the direction of a policy improving outcomes, a policymaker can be confident in concluding that the policy improves outcomes.


For obvious reasons, when a federal law is passed, such as one mandating annual tests, all states and districts are legally compelled to comply. Researchers can’t create a group that does not comply in order to study the effects of the law. But researchers have created approaches that use statistical techniques that come close to being experiments.

For example, at the time when annual tests were required by law, some states already conducted annual tests on their own whereas others did not. Comparing how test scores change in the two kinds of states is similar to an experiment. If the states that had not done annual testing improve their test scores more than the states that did annual testing, the findings suggest annual testing causes higher test scores.

In fact, Eric Hanushek and Margaret Raymond carried out precisely this study and published its results in 2005. David Figlio and Susannah Loeb’s paper reviewed 13 other studies like the Hanushek and Raymond one and concluded that the preponderance of evidence pointed to annual testing improving test scores.

More recent high-quality studies have provided evidence that schools improve their performance when they face sanctions (Ahn and Vigdor 2014), that principals and teachers focus time and effort in different ways that improve performance when their school receives low ratings (Rouse, et al 2013), and that teachers threatened with dismissal for an ineffective rating improved their performance (Dee and Wykoff 2014).

The pattern is clear. Educators respond to consequences.

This seems obvious — who doesn’t respond to consequences? But the education systems in each state have deeply entrenched practices and longstanding patterns of how performance is evaluated.

No Child Left Behind was passed into law in 2002, more than 13 years ago. But that is only a brief window of time compared to how long public schools have operated without consequences. Some educators and their advocates have argued that consequences should be ended. But, on the other side, strong evidence shows consequences have improved education. Why roll back what’s worked?

Readers seeing the phrase “Research shows that…” don’t have access to the kinds of information that researchers use to judge research. Did the research meet high standards? Was it designed to answer the question? Was it done by an objective and independent organization? Did other researchers scrutinize it? Are its findings consistent with other research on the same question?

Science would say the author is responsible for providing readers with the evidence. Richard Feynman, the Nobel Laureate physicist, said “If you're representing yourself as a scientist, then you should explain to the layman what you're doing — and if they don't support you under those circumstances, then that's their decision.”

So, the next time you hear someone cite research during a heated debate over, say, the effects of standardized tests, ask how the research was conducted. Sounds technical, I know, but not all research is the same.

And getting the research right is critical to putting the right policies in place.

Mark Dynarski, president of Pemberton Research in New Jersey, is a Bush Institute education reform fellow.