The latest humiliation in the world of standardized testing involves significant scoring errors on the October SAT. Most of the errors appear to have been small but some may have been as large as 300 or 400 points (out of 2400 on what is now a three-part exam, with each part still bearing a maximum score of 800). It is the outrage that interests me, and the potential involvement of courts and legislatures. Why is it that people hate these tests so much? Standardized tests play an important and perhaps increasing role in our society. Ideally they provise a common denominator in a world where different schools have very different curricula and grading patterns. And there is something of a free market (but only something, because in some industries rankings make it difficult for schools to ignore the standardized test scores of their applicants) in that employers, graduate schools, and others need not use or put much weight on these exams.
But the interesting comparison is to "normal" grades. I suspect that even the worst of the widely used standardized exams produces fewer errors, and certainly fewer significant errors, than my own grading of law school exams. Most professors take grading pretty seriously, but it is easy to make mistakes in constructing questions and in grading them. We expect lawsuits against instructors, TAs (which we do not use in law school but were wildly variable in my experience as an undergraduate and then as a teacher who employed teaching assistants in economics courses), and universities to fail completely. And schools are quite careful to discourage review of grading patterns and performance. As between high school grade point averages and SATs (and then college grades and LSATs) can anyone doubt where more errors are to be found? And yet the anger, and potential lawsuits and legislation, is directed at the standardized exam.
An obvious explanation is that most students experience many grades from many sources (teachers), so that the errors might offset one another. Most applicants take a standardized exam once or twice and so an error is more important, or at least seems so. I am not sure this is so. the number of test-takers who experience an error of 50 or more points, however we define error, is very, very low, and a comparable combination of non-offsetting errors in grades may well be easier to obtain. Colleges and high schools may also have an incentive to monitor their own grades, and indeed they have an incentive to give high grades and to see their graduates "defeat" others in the quest for spots in elite colleges and graduate schools. Test makers might seem to have more market power. In reality, the opposite may be true. The assembler of the standardized test will drop a vendor who makes errors, and there is even some competition (as between the ACT and the SAT) amog test makers.
There are many other explanations for the quantity and quality of rage. Parents and students think of the College Board as an entity, and the test as a single all-important measure, easily described as unfair when it dooms a test taker to a future that seems less bright than that which grades alone would have pointed towards. Grades received in courses are seen as dispersed, and although there are complaints about individual teachers, to be sure, most people see that bad experiences will tend to average out. But of course many colleges and professional schools are driven by rankings and other reasons to care about applicants' numbers. A GPA is a single number, as is the standardized exam that usually accompanies it in a single breath.
I doubt there is much of a legal issue here. Most courts will want to stay away from the job of policing the College Board, much as they stay away from reviewing the work of teachers and TAs. Moreover, test-makers are fairly good at public relations. Even as I blog, they are assuring everyone that no one will be hurt by the error; fees will be returned; college admissions offices are reviewing affected applications and in some cases reaching out to the injured and offering them admission or financial aid where none was offered before. No one seems to notice that each piece of generosity or repair disadvantages some other applicant in a world with scarce resources.
In the long run, the way to keep the customers happy might be to increase transaction costs and divide the test into many parts and give but one part at a sitting. The single standardized score would be the sum of five scores, say, each earned at a different sitting. Complaints about testing conditions and unfairnesses would diminish, as they do for the components of GPAs. The testing industry would be happy, test takers' expectations would be adjusted slowly but surely, and many of the complaints would lose their intensity.