Do State Tests Make the Grade?

It's hard to overestimate the importance of standardized tests in public schools today. Grade advancement, high school diplomas, teacher bonuses, principals' jobs and school reputations can all hinge on whether a student picks the right answer.

So who creates the tests that carry so much weight?

Much of the work is done by five giants: CTB/McGraw-Hill, Educational Testing Service, Harcourt Assessment, Pearson Educational Measurement and Riverside Publishing. Together, the companies own about 90 percent of the state-testing business, which has become a $1.1 billion industry since passage of the federal No Child Left Behind Act in 2001. The law, which took effect in January 2002, requires states to give annual reading and math tests to third- through eighth-graders, and to test students in those subjects once again in high school.

Working with state educators, the big five - or big four, once Pearson's planned acquisition of Harcourt takes place - create and score the tests. But the explosion of testing and changes in the types of tests states administer have left the companies scrambling to keep up.

Also, differences in state standards that are used to create the tests and the reluctance of some states to spend money for high-quality, challenging tests have caused a great disparity in testing from state to state.

For example, a look at various fourth-grade reading tests shows wide differences. Texas' 2006 reading test is entirely multiple choice. Ohio's 2005 test includes several short-answer questions, such as asking for the main conflict in a passage; in another section, students fill out a cause-and-effect chart for a certain problem. Massachusetts' 2007 test was arguably the most rigorous: Students had to answer four long open-response questions.

Some states have fewer questions that test writing skills. A main reason for that is money. Gary Cook, Wisconsin's former testing director, said it could cost a thousand times more to score an essay question than a multiple-choice question. In 2005, the Editorial Projects in Education Research Center reported that 15 states relied entirely on multiple-choice questions in their reading and math tests. Some of these states gave separate writing tests in certain grades. (On Jan. 9, the center reported that 12 states use only multiple-choice questions on their math and reading tests.)  

"People who don't have their heads stuck in the instruction don't realize it's not cheap to do this really well," Cook said of test-making. "And right now, I don't know many legislatures that are very open to spending money or raising taxes to develop these kinds of instruments."

Last year, the federal government gave states $407.6 million to help pay for testing. But states have said that falls short. In January, a federal appeals court revived a lawsuit that charges the federal government does not provide enough money for states and districts to meet the law's requirements. 

In the 2006-07 school year, Virginia spent in state and federal money about $11.92 per test, while Washington state spent about $17.74 per test. But these amounts also include the cost of additional tests the states administered that were not required by No Child Left Behind.

South Carolina doesn't even calculate a "per pupil" number because not every student is tested. Despite these disparate ways of measuring spending, one thing is known: Of the total education budget combining federal, state and local money, less than one quarter of 1 percent goes to testing.

"States are not putting any more resources into the testing infrastructure, and as a result, we are getting testing on the cheap, and that is working against No Child Left Behind's efforts to produce high-quality assessments that promote higher standards," said Thomas Toch, the co-director of Education Sector, a nonpartisan think tank. "If we're going to make tests the driver of quality in public education, then we need to invest to ensure that we get tests that are up to that task."

On the whole, however, state spending on testing has shot up since George W. Bush's education plan became the law of the land. In early 2001, a year before No Child Left Behind was enacted, states collectively spent almost $423 million on standardized tests, according to a report. During the 2007- 08 school year, states will spend almost $1.1 billion on these tests, according to Eduventures Inc., an education industry research firm.

The costs have been driven up by the sheer volume of testing required by the law. In 2005-06, when states had to have math and reading tests in place for all the required grades for the first time, about 45 million tests were administered throughout the country, 11.4 million more than the previous year. This year, states are required to add a science test, which is expected to add another 11 million tests to the total.

One side effect of the greater demand for tests is a shortage of educational professionals with a jawbreaking job title: psychometrician. Their task is to oversee test creation, administration and scoring. Competition for them is fierce among the test companies, and they're often lured away from one company by another.

"A psychometrician has to make sure that algorithm is absolutely correct so that a student who is just barely going to pass, doesn't just barely fail," said Dr. Michael Bunch, a psychologist and senior vice president of research and development at North Carolina-based Measurement Inc., which has contracts with a dozen states.

The increased emphasis on testing has caused other pressures as well. States are now testing later in the academic year to squeeze in more teaching. And, states want scores back faster than ever before.

Not surprisingly, this has caused some high-profile errors. In 2004, CTB/McGraw-Hill had to re-score thousands of Connecticut tests after scores came in mysteriously low because of the grading on the writing section. In 2006, the late distribution of tests to Illinois by Harcourt meant a long delay in getting the scores back to the state. Last year, American Institutes for Research had to re-grade 98,000 Hawaii tests after teachers found that some students who submitted blank test books received scores anyway.

Probably the biggest impact of the No Child Left Behind law has been on the kind of tests states are giving, which has changed dramatically. The law, which imposed so much federal intrusion into local classrooms, passed only with a compromise: States would be allowed to create the tests.

The result was states switched from standardized tests that compare how their students stack up against students across the country - the Stanford Achievement Test is a prominent example - to those based on each state's specific standards. But this system allows the difficulty of the tests to vary widely from state to state, resulting in some states producing easier tests that measure lower-level skills.

A report released in October by the conservative Thomas J. Fordham Foundation studied testing in 26 states by comparing students' proficiency on two tests - their state tests and the Measures of Academic Progress (MAP) test, administered by the Northwest Evaluation Association (NWEA).

The results were stark: Differences in student performance from state to state were significant. According to NWEA's analysis, to be considered proficient readers in Wisconsin, for example, fourth-graders needed to answer questions about as difficult as one that asked them to note a few differences between cats and dogs. But fourth-graders in Massachusetts faced more difficult questions such as those about a written passage by Russian author Leo Tolstoy.

Wisconsin Department of Public Instruction spokesman Patrick J. Gasper called the Fordham study flawed and pointed out that the two tests compared in the report have different goals and measure different skills.

"Their report attempts to draw conclusions by comparing two different types of tests that are scored on two different scales," Gasper said. "The purpose and types of tests being compared are different."

Test difficulty can be measured in a variety of ways. One is by looking at how tough the questions are. Again, a review of various fourth-grade reading tests shows that reading passages on Massachusetts' test are lengthier and appear to be more detailed and use more difficult vocabulary than the passages on Ohio's and Texas' tests. One question on Ohio's test simply asks students to identify who is speaking in the passage.

But that is an imprecise way to determine a test's difficulty, said John Cronin, a research specialist at NWEA and one of the Fordham study's lead authors. Several factors could make a reading question difficult, such as the length of the passage, how straightforward the question is and the quality of the wrong answers in multiple-choice questions. Test creators can determine how difficult a question is only after they see how it has performed in field tests with real students.

And even if the questions themselves are challenging, a test's difficulty can be misleading if the state sets a low cut score, or the number of questions a student must answer correctly to be deemed proficient. The Fordham Foundation report found that states' definition of "proficient" based on the cut score was far from consistent. Cronin said setting low cut scores would be like saying a high school baseball player is proficient only if he can hit against Major League pitcher Roger Clemens - an admittedly high performance standard - but then requiring batters to get only one hit out of 50 pitches to meet that standard.

"Instead of a .300 (batting average), they're aiming for .020," Cronin said. "It's a very rigorous test, facing a very difficult pitcher, but the standard of performance is very low."

These test disparities and the wide variety of tests has fueled calls for national standards.

"You may be deemed proficient in North Dakota math or Wisconsin math, but that doesn't mean you're proficient, really, in math," said Fordham spokesman Jeffrey Kuhner. "If you want to prevent this varied discrepancy from state to state, rather than dumbing down our standards we believe that the way to do it is to have an across-the-board national test and national standards."

International comparisons show that students in several countries have caught up to - and passed - American students. The results from the 2006 Program for International Student Assessment, which compares how 15-year-olds in 57 countries perform, showed U.S. students were 29th in science and 35th in math, behind counties like Estonia, Slovenia and Latvia.

As Congress works to authorize No Child Left Behind, several experts are promoting national standards, but the concept seems unlikely to advance in the face of staunch opposition by local-control advocates. When Fordham released an analysis showing how national standards could come about, it was titled, "To Dream the Impossible Dream."

A bipartisan discussion draft proposing changes to NCLB released by key U.S. House members - most notably Rep. George Miller (D-Calif.), the chairman of the House Committee on Education and Labor and one of the prime movers of the original No Child Left Behind bill - would not require national standards or a national test. The draft, released in August, would give states incentives to work with universities and the business community to develop more rigorous standards and tests based on those standards.

"To be successful, our system of accountability must encourage states to set high standards," Miller said in September at a House hearing. "Lowering the bar so more students can reach it is a sham."

This article was excerpted from "State of the States 2008,"'s annual report on significant state policy developments and trends released Jan. 16. In parentheses are any news updates since the report was sent to the printer.


Related Stories

    • Stateline Story
    August 22, 2012
    image description

    Despite improvements across the board, the majority of high school graduates still aren’t college or career ready, according to a report by ACT, which administers the ACT college entrance test. more

    • Stateline Story
    September 3, 2010
    image description

    TODAY'S TAKE: The Obama administration is continuing its recent effort to make schools across states adhere to the same standards. On Thursday, U.S. Education Secretary Arne Duncan rolled out a plan to prepare common standardized tests for states to adopt. The new tests, which would replace existing state assessments, have already run into some political opposition. more

    • Stateline Story
    March 25, 2010
    image description

    TODAY'S TAKE: Reading scores for American children have barely budged over the past two years, a new federal report says, an assessment U.S. Education Secretary Arne Duncan calls sobering. Scores for fourth-graders nationwide were flat between 2007and 2009,  according to the National Assessment of Education Progress, often referred to as the Nation's Report Card. For eighth-graders, reading scores improved by one point on a 500-point scale. Students in grades 4 and 8 take reading and math tests every few years to produce the reports.

    • Stateline Story
    September 25, 2007
    image description

    (Updated 2:20 p.m. EDT, Tuesday, Sept. 25)American parents can breathe a sigh of relief: their children are making substantial strides in math, and slower but still significant gains in reading, according to new national test results released today. The improvements are sure to boost claims that the controversial No Child Left Behind Act is succeeding in raising student achievement.

    • Stateline Story
    June 5, 2007
    image description

    A new report finds that most states have seen dramatic improvements in math and reading test results since  passage of the No Child Left Behind Act five years ago, but it's too early to tell whether the gains can be tied directly to the law.