Do State Tests Make the Grade?
By Pauline Vu, Staff Writer
So who creates the tests that carry so much weight?Much of the work is done by five giants: CTB/McGraw-Hill, Educational Testing Service, Harcourt Assessment, Pearson Educational Measurement and Riverside Publishing. Together, the companies own about 90 percent of the state-testing business, which has become a $1.1 billion industry since passage of the federal No Child Left Behind Act in 2001. The law, which took effect in January 2002, requires states to give annual reading and math tests to third- through eighth-graders, and to test students in those subjects once again in high school.
Working with state educators, the big five - or big four, once Pearson's planned acquisition of Harcourt takes place - create and score the tests. But the explosion of testing and changes in the types of tests states administer have left the companies scrambling to keep up.
Also, differences in state standards that are used to create the tests and the reluctance of some states to spend money for high-quality, challenging tests have caused a great disparity in testing from state to state.
For example, a look at various fourth-grade reading tests shows wide differences. Texas' 2006 reading test is entirely multiple choice. Ohio's 2005 test includes several short-answer questions, such as asking for the main conflict in a passage; in another section, students fill out a cause-and-effect chart for a certain problem. Massachusetts' 2007 test was arguably the most rigorous: Students had to answer four long open-response questions.
Some states have fewer questions that test writing skills. A main reason for that is money. Gary Cook, Wisconsin's former testing director, said it could cost a thousand times more to score an essay question than a multiple-choice question. In 2005, the Editorial Projects in Education Research Center reported that 15 states relied entirely on multiple-choice questions in their reading and math tests. Some of these states gave separate writing tests in certain grades. (On Jan. 9, the center reported that 12 states use only multiple-choice questions on their math and reading tests.)
"People who don't have their heads stuck in the instruction don't realize it's not cheap to do this really well," Cook said of test-making. "And right now, I don't know many legislatures that are very open to spending money or raising taxes to develop these kinds of instruments."
Last year, the federal government gave states $407.6 million to help pay for testing. But states have said that falls short. In January, a federal appeals court revived a lawsuit that charges the federal government does not provide enough money for states and districts to meet the law's requirements.
South Carolina doesn't even calculate a "per pupil" number because not every student is tested. Despite these disparate ways of measuring spending, one thing is known: Of the total education budget combining federal, state and local money, less than one quarter of 1 percent goes to testing.
"States are not putting any more resources into the testing infrastructure, and as a result, we are getting testing on the cheap, and that is working against No Child Left Behind's efforts to produce high-quality assessments that promote higher standards," said Thomas Toch, the co-director of Education Sector, a nonpartisan think tank. "If we're going to make tests the driver of quality in public education, then we need to invest to ensure that we get tests that are up to that task."
On the whole, however, state spending on testing has shot up since George W. Bush's education plan became the law of the land. In early 2001, a year before No Child Left Behind was enacted, states collectively spent almost $423 million on standardized tests, according to a Stateline.org report. During the 2007- 08 school year, states will spend almost $1.1 billion on these tests, according to Eduventures Inc., an education industry research firm.
The costs have been driven up by the sheer volume of testing required by the law. In 2005-06, when states had to have math and reading tests in place for all the required grades for the first time, about 45 million tests were administered throughout the country, 11.4 million more than the previous year. This year, states are required to add a science test, which is expected to add another 11 million tests to the total.
One side effect of the greater demand for tests is a shortage of educational professionals with a jawbreaking job title: psychometrician. Their task is to oversee test creation, administration and scoring. Competition for them is fierce among the test companies, and they're often lured away from one company by another.
"A psychometrician has to make sure that algorithm is absolutely correct so that a student who is just barely going to pass, doesn't just barely fail," said Dr. Michael Bunch, a psychologist and senior vice president of research and development at North Carolina-based Measurement Inc., which has contracts with a dozen states.
The increased emphasis on testing has caused other pressures as well. States are now testing later in the academic year to squeeze in more teaching. And, states want scores back faster than ever before.
Not surprisingly, this has caused some high-profile errors. In 2004, CTB/McGraw-Hill had to re-score thousands of Connecticut tests after scores came in mysteriously low because of the grading on the writing section. In 2006, the late distribution of tests to Illinois by Harcourt meant a long delay in getting the scores back to the state. Last year, American Institutes for Research had to re-grade 98,000 Hawaii tests after teachers found that some students who submitted blank test books received scores anyway.
Probably the biggest impact of the No Child Left Behind law has been on the kind of tests states are giving, which has changed dramatically. The law, which imposed so much federal intrusion into local classrooms, passed only with a compromise: States would be allowed to create the tests.
The result was states switched from standardized tests that compare how their students stack up against students across the country - the Stanford Achievement Test is a prominent example - to those based on each state's specific standards. But this system allows the difficulty of the tests to vary widely from state to state, resulting in some states producing easier tests that measure lower-level skills.
A report released in October by the conservative Thomas J. Fordham Foundation studied testing in 26 states by comparing students' proficiency on two tests - their state tests and the Measures of Academic Progress (MAP) test, administered by the Northwest Evaluation Association (NWEA).
The results were stark: Differences in student performance from state to state were significant. According to NWEA's analysis, to be considered proficient readers in Wisconsin, for example, fourth-graders needed to answer questions about as difficult as one that asked them to note a few differences between cats and dogs. But fourth-graders in Massachusetts faced more difficult questions such as those about a written passage by Russian author Leo Tolstoy.
Wisconsin Department of Public Instruction spokesman Patrick J. Gasper called the Fordham study flawed and pointed out that the two tests compared in the report have different goals and measure different skills.
"Their report attempts to draw conclusions by comparing two different types of tests that are scored on two different scales," Gasper said. "The purpose and types of tests being compared are different."
Test difficulty can be measured in a variety of ways. One is by looking at how tough the questions are. Again, a review of various fourth-grade reading tests shows that reading passages on Massachusetts' test are lengthier and appear to be more detailed and use more difficult vocabulary than the passages on Ohio's and Texas' tests. One question on Ohio's test simply asks students to identify who is speaking in the passage.
But that is an imprecise way to determine a test's difficulty, said John Cronin, a research specialist at NWEA and one of the Fordham study's lead authors. Several factors could make a reading question difficult, such as the length of the passage, how straightforward the question is and the quality of the wrong answers in multiple-choice questions. Test creators can determine how difficult a question is only after they see how it has performed in field tests with real students.
And even if the questions themselves are challenging, a test's difficulty can be misleading if the state sets a low cut score, or the number of questions a student must answer correctly to be deemed proficient. The Fordham Foundation report found that states' definition of "proficient" based on the cut score was far from consistent. Cronin said setting low cut scores would be like saying a high school baseball player is proficient only if he can hit against Major League pitcher Roger Clemens - an admittedly high performance standard - but then requiring batters to get only one hit out of 50 pitches to meet that standard.
"Instead of a .300 (batting average), they're aiming for .020," Cronin said. "It's a very rigorous test, facing a very difficult pitcher, but the standard of performance is very low."
These test disparities and the wide variety of tests has fueled calls for national standards.
"You may be deemed proficient in North Dakota math or Wisconsin math, but that doesn't mean you're proficient, really, in math," said Fordham spokesman Jeffrey Kuhner. "If you want to prevent this varied discrepancy from state to state, rather than dumbing down our standards we believe that the way to do it is to have an across-the-board national test and national standards."
International comparisons show that students in several countries have caught up to - and passed - American students. The results from the 2006 Program for International Student Assessment, which compares how 15-year-olds in 57 countries perform, showed U.S. students were 29th in science and 35th in math, behind counties like Estonia, Slovenia and Latvia.
As Congress works to authorize No Child Left Behind, several experts are promoting national standards, but the concept seems unlikely to advance in the face of staunch opposition by local-control advocates. When Fordham released an analysis showing how national standards could come about, it was titled, "To Dream the Impossible Dream."
A bipartisan discussion draft proposing changes to NCLB released by key U.S. House members - most notably Rep. George Miller (D-Calif.), the chairman of the House Committee on Education and Labor and one of the prime movers of the original No Child Left Behind bill - would not require national standards or a national test. The draft, released in August, would give states incentives to work with universities and the business community to develop more rigorous standards and tests based on those standards.
"To be successful, our system of accountability must encourage states to set high standards," Miller said in September at a House hearing. "Lowering the bar so more students can reach it is a sham."
This article was excerpted from "State of the States 2008," Stateline.org's annual report on significant state policy developments and trends released Jan. 16. In parentheses are any news updates since the report was sent to the printer.