Vol. 2, No. 2 - Education Next https://www.educationnext.org/journal/vol-02-no-02/ A Journal of Opinion and Research About Education Policy Tue, 23 Jan 2024 16:08:02 +0000 en-US hourly 1 https://wordpress.org/?v=6.5.5 https://i0.wp.com/www.educationnext.org/wp-content/uploads/2019/12/e-logo.png?fit=32%2C32&ssl=1 Vol. 2, No. 2 - Education Next https://www.educationnext.org/journal/vol-02-no-02/ 32 32 181792879 The Business Model https://www.educationnext.org/the-business-model/ Fri, 01 Sep 2006 00:00:00 +0000 http://www.educationnext.org/the-business-model/ Value-added analysis is a crucial tool in the accountability toolbox--despite its flaws

The post The Business Model appeared first on Education Next.

]]>

Like the makers of hot dogs, psychometricians, economists, and other testing experts know too well what goes into the creation of achievement tests. Their intimate knowledge of the technical difficulties involved in measuring student achievement makes a number of these testing experts some of the most vocal (and persuasive) opponents of testing. But the flaws in techniques like value-added assessment do not automatically lead to the conclusion that those techniques shouldn’t be used to hold educators accountable. Testing may be imperfect, but the alternative–the old system, which allowed us to know very little about the performance of educators–is far, far worse.

To be sure, many of the technical criticisms of value-added testing are correct. It’s true that there is more random error in measuring gains in test scores than in measuring the level of test scores. It’s true that there is some uncertainty as to whether gains in one area of the test scale are equal to gains at another point in the test scale. And it’s true that factors besides the quality of the schools can influence the gains that students achieve. But, on balance, these downsides hardly outweigh the benefits to be reaped from being able to measure and reward productivity in education.

Consider what is likely to continue to happen in education without high-stakes value-added assessment. Unless productivity is measured, however imperfectly, it is not possible to reward teachers, administrators, and schools that contribute most to student learning. If we do not reward productivity, we are unlikely to encourage it. If we do not encourage it, we should not expect more of it.

In fact, this is precisely what has been happening in U.S. education during the past few decades. Between 1961 and 2000, spending on education tripled after adjusting for inflation, from $2,360 to $7,086 per pupil. During that time, student performance, as measured by scores on the National Assessment of Educational Progress (NAEP) and high-school graduation rates, has remained basically unchanged. Whenever spending triples without any significant improvement in outcomes, there is a serious productivity crisis. Yet U.S. public schools just keep chugging along, resisting serious attempts at reform.

Meanwhile, private firms in the United States have been able to achieve steady gains in productivity because the discipline of competition has forced them to adopt systems for measuring and rewarding productivity. Firms that fail to measure and reward productivity lose out to their competitors who do.

Moreover, the systems that private companies use to measure and reward productivity are far from flawless. In fact, the challenge of measuring productivity in the private sector is often as great as or greater than in education. Imagine a soft-drink company that wishes to measure and reward the productivity of its sales force. The company might determine bonuses (and even decisions on layoffs) based on its salespeople’s success at increasing soda sales in their sales area. Like measuring gains in test scores, measuring increases in soda sales is fraught with potential error. Changes in soda sales could be influenced by a variety of factors other than the sales acumen of an employee. Unusually cold weather in an area, a local economic downturn, or exceptional promotional efforts by competitors could all suppress the soda sales of even a very good salesperson. If data on sales are collected using survey techniques, there is also the possibility of random error attributable to the survey method, just as testing has random error. Moreover, if we are comparing sales increases across geographic areas, it is unclear whether it takes more skill to sell soda in an area where the market is already saturated than in an area that initially consumes less soda.

In short, many of the same technical flaws that critics find in value-added testing also exist in the measurement of increases in soda sales. Changes in outcomes may be attributable to factors other than the efforts of the employee. There is random error in collecting the data. And the effort required to produce gains at one level may not be the same as at another level. The only difference is that private firms have rightly not let their inability to achieve the best deter them from pursuing the good.

In the private sector, companies have realized that even flawed evaluation systems nevertheless encourage improvements in productivity. This is because employees cannot be sure that a flawed system will completely obscure the picture of how hard they’re working. Employees therefore act as if their productivity were being measured accurately; the chance that slacking will be detected inspires employees to avoid slacking. In fact, evaluation systems with a fairly large amount of error in measuring productivity can still be effective at motivating improvement–if the errors are mostly random, or at the very least do not create perverse incentives, such as encouraging teachers to focus on improving the achievement of one group of students to the exclusion of others.

None of the technical concerns with value-added testing involve perverse incentives. For the most part, the criticisms have to do with random noise in measuring gain scores. Even the nonrandom errors that worry testing critics, such as unevenness in the testing scale or the possible influence of factors outside the school’s control, do not create perverse incentives because there are no strong theories about the kinds of behaviors those errors would encourage.

If no one knows what is being mistakenly rewarded, no one has an incentive to engage in that perverse behavior. As long as educators are aware of what the value-added system is supposed to be rewarding, and as long as that system rewards the desired outcomes more than it erroneously rewards something else, the system will help to elicit more of the desired outcomes–namely, improvements in student achievement.

The Uses of Data

The development of even an imperfect value-added testing system would revolutionize the systems for hiring, promoting, and compensating teachers. Our current methods provide teachers with little incentive to improve achievement. Promotions and salary increases are based on teachers’ seniority and their acquisition of advanced degrees. These characteristics are, at best, weakly related to student achievement. Excellent teachers who possess a master’s degree and a few years’ experience receive exactly the same salary as lousy teachers with the same formal credentials. Under the current system, we have turned the keys over to educators and trusted that their professionalism will yield improvements in student achievement. Education’s productivity crisis in the past four decades should be evidence enough that simple trust is not sufficient.

The development of value-added assessment would also revolutionize how we govern schools and hold them accountable. We currently have little rational basis for saying that a particular school is a good school or that a particular superintendent is a good superintendent. Value-added testing would at least give voters some idea of whether they are getting their tax money’s worth out of the school system by giving them at least some information on how the schools are doing. The fact that voters would have better information on achievement provides the school board with incentives to hire and retain a superintendent who can elicit improvement in student learning. The superintendent, in turn, has an incentive to hire and retain principals who will use the value-added results to hire and promote the best teachers.

Critics of value-added assessment don’t necessarily object to using value-added assessment. They object to using the data gleaned from it for high-stakes purposes, such as rewarding or punishing individual schools and teachers. Instead, they suggest that value-added results be provided to administrators so that they can make informed decisions about their employees. This is, to some extent, what happens in the private sector; most private firms do not use the crude techniques exemplified by the real-estate company in David Mamet’s Glengarry Glen Ross, where the salesperson with the fewest sales was fired. Most companies use productivity measures to inform the subjective assessments of supervisors, with some companies permitting less subjective judgment than others for fear of bias or favoritism.

But here is where the parallel between measuring productivity in public education and private industry ends. Supervisors in private companies have incentives to use the information provided by productivity measures properly, because their companies face the discipline of competition from other companies. If supervisors fail to put data on costs, sales, and revenue to good use, their companies will lose out to competitors who do.

In public education, by contrast, local decisionmakers have few or no incentives to make good use of data in assessing their employees because public schools face no meaningful competition. There are basically no consequences for principals who disregard the results of value-added assessments in making decisions about employees, and they’re more likely to disregard those results if they consider value-added assessment an unreliable analytical technique. Superintendents will not be able to judge whether principals have used their discretion properly, because they will be told that the value-added test results are not proper grounds for assessing the decisions of principals. And school boards and voters, in turn, will all be stymied in making independent judgments because they will be told that the professional decisions of educators are more reliable than value-added test results.

So there is good reason to fear that principals in public schools, if given discretion to reward teachers as they please, will base their decisions on personal relationships rather than on the results of value-added assessments. In both private industry and public education, a balance must be struck between the mechanical use of productivity measures in assessing employees and relying on the subjective judgments of supervisors. But, in public education, the balance needs to tilt more toward the mechanical application of results, because supervisors have fewer incentives to make appropriate subjective judgments. This means that value-added assessment needs to be high stakes to have the desired positive effect on student learning.

Injustices are unavoidable if value-added assessments are used more mechanically. Some educators will be improperly punished for eliciting what appear to be low gains because of measurement error. Conversely, some educators will be rewarded for improvements for which they were not actually responsible. That said, some of the flaws in value-added assessment have potential technical solutions. For example, if judging a teacher based on his classroom’s test scores contains too much error because the sample is too small, we might decide to rate teachers based on a moving average of multiple years of results, thereby increasing the sample size and reducing the random error. But even with technical fixes, some injustices will still occur.

This is not a good reason to abandon the idea, however. After all, some educators will be treated unjustly under any evaluation system. Under the current system, excellent, hard-working teachers who put in tons of overtime receive the same salary as mediocre teachers. What’s fair about that? The Buffalo, New York, district recently announced layoffs as a result of a budget deficit. Who were the first to receive pink slips? Not the worst teachers in the district, but the most recently hired. Surely some excellent teachers lost their jobs, while the district retained its burned-out veterans. A peer- or supervisor-review system may reward teachers who are popular among their peers rather than effective with students. Attempting to measure and reward successful educators, with all of its imperfections, is likely to create fewer injustices than any other arrangement. At least using value-added assessments increases the chance that good teachers are rewarded and bad teachers are sanctioned.

Besides, ensuring that every single educator receives justice is at most a secondary concern. An obvious but infrequently recognized truth is that the primary purpose of the education system is to provide a quality education to all students. If high-stakes value-added assessment can help motivate public schools to provide students with a better education, it’s a promising reform even if it has some cost to school employees. In no other industry would we even entertain the notion that the interests of employees trump those of customers. Only the political dominance of teacher unions makes us consider the question. It is true that customers usually receive the best service when employees are treated well and fairly. Happily, high-stakes value-added assessment is likely to achieve both ends.

The productivity crisis in public education has certainly created injustices for students and taxpayers during the past few decades. Students have failed to receive high-quality instruction while taxpayers have been paying more and more in return for stagnant test scores. Again, what’s fair about that?

-Jay P. Greene is a senior fellow at the Manhattan Institute for Policy Research.

The post The Business Model appeared first on Education Next.

]]>
49696120
The Seeds of Growth https://www.educationnext.org/the-seeds-of-growth/ Mon, 17 Jul 2006 00:00:00 +0000 http://www.educationnext.org/the-seeds-of-growth/ The United States became the world’s economic superpower over the course of the 20th century. But can today’s education system be counted on to fertilize growth in the future?

The post The Seeds of Growth appeared first on Education Next.

]]>
Illustration by Tom Curry.


The early 1990s saw the height of the East Asian miracle. The economies of Japan, South Korea, Thailand, Malaysia, and other countries of the region were expanding at rates that dwarfed those of the United States and the mostly European nations of the Organization for Economic Cooperation and Development (OECD). The so-called “Asian tigers” were projected to surpass the U.S. economy in the not-so-distant future. In the national soul-searching that ensued, new attention was focused on the U.S. education system. Our poor academic performance vis-á-vis the countries of East Asia and indeed most of the developed world became a source of deep concern. The touchstone A Nation at Risk report expressed fear of a deteriorating education system leading to an erosion in our global standing. Thus many educators breathed a sigh of relief when the East Asian countries lost their luster. The U.S. economy entered a record-breaking period of high employment and recession-free growth during the 1990s, while Japan stagnated terribly-thereby validating, supposedly, our educational performance and approach.

Take testing critic Alfie Kohn, who writes, “As proof of the inadequacy of U.S. schools, many writers and public officials pointed to the sputtering condition of the U.S. economy. As far as I know, none of them subsequently apologized for offering a mistaken and unfair attack on our educational system once the economy recovered, nor did anyone credit teachers for the turnaround.” Or consider Gerald Bracey, a prominent defender of the public schools. Criticizing those before and after A Nation at Risk who have urged education reform in the interest of maintaining economic growth, he wrote in a recent Washington Post essay, “None of these fine gentlemen provided any data on the relationship between the economy’s health and the performance of schools. Our long economic boom suggests there isn’t one-or that our schools are better than the critics claim.”

Thus the sterling recent performance of the U.S. economy has become a convenient rhetorical tool for those who maintain that the education system isn’t in need of any serious reform. But these critics have woefully misunderstood the nature of economic growth and its link to educational performance. The confusion owes partly to the language and perspectives of the A Nation at Risk report itself. Written in 1983, on the heels of the stagflation of the 1970s and a recession caused by the Federal Reserve’s attempts to curb inflation, it implied that the current state of the economy could be traced directly to the performance of the education system.

Such a perspective fails to distinguish between economic growth and the business cycle. Economic growth is a long-term concept. It depends on past investments in physical capital, like industrial plants and machinery; human capital, the economist’s term for workers’ education and skills; and the pace of technological innovation. Now, growth may slow in the short term because of a downturn in the business cycle. But these short-term fluctuations in the unemployment rate, inflation, and economic growth from quarter to quarter or even year to year should not be confused with the economy’s ability to grow in the long term. And they certainly bear no relation to the current state of the education system.

That unemployment is lower on any given day in the United States than in Japan or Korea says virtually nothing about the relative quality of their schools. It might instead say something about the quality of current fiscal and monetary policies or about the extent of labor market and trade barriers across countries. It might even bear some relationship to the human capital investments made in past periods-when the current range of workers in the labor force was attending schools and investing in skills. Most workers in the economy were educated years and even decades in the past-and they are the ones that have the greatest impact on current levels of productivity and growth. Concerns about the current performance of U.S. schools reflect concerns about the potential for economic growth in the future-when today’s elementary, middle, and high schoolers become tomorrow’s engineers and scientists.

Skilled labor is becoming more and more valued in today’s economy. This is reflected in the compensation that skilled workers receive and the subsequent distribution of income in the economy. The gap between the skilled and the unskilled continues to grow. In turn, the economy’s long-term health is dependent on having a skilled labor force. The education system is central to the development of skills and human capital, a fact long recognized by parents, policymakers, and educators. During the past century, the United States led the world in the expansion of its education system, contributing to the dominant position of the United States in the world economy. Nonetheless, there is reason to be concerned about the future. The evidence suggests that the American K-12 education system is falling behind those of other developed nations. As a result, it is unclear whether we will be able to count on the education system to fuel future U.S. economic growth. As economic growth is crucial to our well-being, this is a matter we should take very seriously.

Human Capital and Economic Growth

Economic growth determines how much improvement will occur in a society’s overall standard of living. The effect of differences in growth rates on economic well-being is easy to see. If gross domestic product (GDP) per capita were to grow at 1 percent each year for 50 years, it would increase from $34,950 in 2000 to $57,480 in the year 2050-more than a 50 percent increase over the period. However, if it were to grow at 2 percent per year, it would reach $94,000 in 2050! Small differences in growth rates have huge implications for the income and wealth of society.
In turn, a society’s ability to develop human capital is crucial to its ability to grow. Human capital consists of the skills possessed by individuals and, in the aggregate, by the labor force as a whole. It is the result of a variety of investments made by individuals and institutions-in formal schooling, workplace training, life experience, and so on. In other words, formal schooling is not the only way to develop human capital-but it is a critical component. And schooling, as William Easterly emphasizes and I discuss below, is not sufficient to ensure growth, but it certainly plays a large role in a society like that of the United States, where the other preconditions for growth are in place.

Educational Quality

Concerns about the current performance of U.S. schools reflect concerns about the potential for economic growth in the future-when today’s elementary, middle, and high schoolers become tomorrow’s engineers and scientists. Photograph by Comstock Images.


Much of the early empirical work on human capital concentrated on the role of school attainment-that is, the quantity of schooling. This focus was natural. The revolution in the United States during the 20th century was universal schooling. Moreover, quantity of schooling is easily measured, and long-term data on years attained are readily available. Today, however, policy concerns revolve around quality issues much more than quantity issues. The completion rates for high school and college have been roughly constant for a quarter of a century. In addition, measures like completion rates and universal elementary education don’t say much about the kind of education that is being offered. Developing skills useful in the labor market and for economic growth requires rigorous training in high-quality schools. Schools that amount to no more than free day care or points of religious indoctrination won’t contribute to economic growth.

The current economic position of the United States is largely the result of its strong and steady growth during the 20th century. Strangely, in the period after World War II, economists did not pay as much attention to economic growth as they did to macroeconomic fluctuations in an attempt to tame the business cycle. In the past 15 years, economists have returned to questions of economic growth. A variety of models and ideas have been developed to explain differences in growth rates across countries; the importance of human capital is invariably a component.

The typical study finds that quantity of schooling is highly related to economic growth rates. But, again, quantity of schooling is a very crude measure of the knowledge and cognitive skills of people. It is unlikely that what is learned during the 6th grade in a rural hut in a developing country equals what is learned in an American 6th grade. Yet that is what empirical analyses implicitly assume when they focus exclusively on differences in average years of schooling across countries.

Recently, Dennis Kimko and I have delved into issues of educational quality. We incorporated the information developed during four decades of international testing on the varying mathematics and science performance of nations around the world. Our research has found a solid link between differences in school quality and differences in economic growth.

In 1963 and 1964, the International Association for the Evaluation of Education Achievement (IEA) administered the first of a series of mathematics tests to a voluntary group of countries. These assessments faced a number of challenges: developing a test that provided a fair comparison across countries with different school structures, curricula, and languages; creating comparable groups of testing participants across countries; and persuading countries to participate. The first tests did not document or even address these issues in any depth. However, these tests did prove that such testing was feasible and set in motion a process to expand and improve on the undertaking. Subsequent testing, sponsored by the IEA and others, has included both math and science and has expanded the group of countries tested. In each, the general model has been to develop a common assessment instrument for different age groups of students and to attempt to obtain a representative group of students taking the tests in each country.

Our analysis was very straightforward. We combined all of the available earlier test scores into a single composite measure of quality and introduced it into statistical models that explain differences in growth rates across nations during the period 1960 to 1990. (We excluded results from the 1995 Third International Math and Science Study and subsequent tests because they were obtained outside the analytical period of interest.) The underlying objective was to obtain a measure of quality for the labor force during the period for which we have measurements of economic growth. The basic statistical models, which include the level of income, the quantity of schooling, and population growth rates, explain a substantial portion of the variation in economic growth. Significantly, the quality of the labor force as measured by math and science scores proved to be extremely important.

Worldwide, we found that a difference in test performance of one standard deviation was related to a 1 percent difference in the annual growth rate of per-capita GDP. The impact of such a difference in growth rates is very large. As we saw earlier, 1 percent higher growth-say, growth of 2 percent versus 1 percent per year-over a 50-year period yields incomes that are 64 percent higher. Moreover, adjusting the data for other factors that are potentially related to growth, including aspects of international trade, private and public investment, and political instability, leaves the effect of having a quality labor force unchanged.

A common concern in analyses like this one is that schooling might not be the actual cause of growth but may just reflect other attributes of the economy that are beneficial to growth. For example, East Asian countries consistently score high on the international tests, and they also had extraordinarily high growth during the 1960 to 1990 period. It may be that other aspects of these East Asian economies have driven their growth and that the statistical analysis of labor-force quality is simply picking out these countries. But if the East Asian countries are excluded from the analysis, a strong-albeit slightly smaller-relationship is still observed between test performance and economic growth.

Another concern might be that other factors affecting growth, such as efficient market organizations, are also associated with efficient and productive schools-so that again the test measures are really a proxy for other attributes of the country. To investigate this, we concentrated on immigrants to the United States who received their education in their home countries. We found that immigrants who were schooled in countries that have higher scores on the international math and science examinations earn more in the United States. This analysis makes allowance for any differences in school attainment, experience in the labor market, or being a native English- language speaker. In other words, skill differences as measured by the international tests are clearly rewarded in the U.S. labor market, reinforcing the validity of the tests as a measure of individual skills and productivity.

Finally, the observed relationships could simply reflect reverse causality. In other words, countries that are growing rapidly have the resources to improve their schools. In this case, better student performance is the result of growth, not its cause. As a simple test of this, we investigated whether the international math and science test scores were systematically related to the resources devoted to the schools in the years before the tests. They were not. If anything, we found relatively better performance in those countries that spent less on their schools.

In sum, the relationship between math and science skills on the one hand and productivity and growth on the other comes through clearly when investigated in a systematic manner across countries. This finding underscores the importance of high-quality schooling to future well-being.

Explaining the U.S. Economy

Photograph by David Muir/Masterfile.


In this context, the United States presents a difficult conundrum. In the international exams of math and science that have taken place since 1970, the United States has been at best in the middle of the pack, at worst well below average. At the same time it has become the world’s economic superpower. How to reconcile these diverging trends?

The answer is that the quality of the labor force is just one aspect of the economy that contributes to economic growth. Expanding education in a developing economy, as Easterly argues, is unlikely to foster much growth if the economy fails to simultaneously acquire the market structures and legal and governance systems that are necessary for a high-performing economy. The United States has an abundance of these attributes, and they appear to compensate for the shortcomings of its education system.

Almost certainly the most important factor sustaining the growth of the U.S. economy is the openness and fluidity of its markets. The U.S. maintains generally freer labor and product markets than most countries in the world. The government generally lathers less regulation on firms, and trade unions are less powerful than those in many other countries. More broadly, the U.S. government intrudes less in the economy-not only with less regulation, but also with lower tax rates and minimal government production through nationalized industries. These factors encourage investment, permit the rapid development of new products, and allow U.S. workers to adjust to new opportunities. While identifying the precise importance of these factors is difficult, a variety of analyses suggest that such market differences could be very important explanations for differences in growth rates. These favorable institutional conditions have in some ways compensated for the deficits of our education system.

The United States has also been saved by the expansion of opportunities for higher education for its citizens. During the 20th century, the expansion of the education system in the United States outpaced the rest of the world. The United States pushed to open secondary schools to all citizens. With this also came a move to expand higher education with the development of land grant universities, the G.I. bill, and direct grants and loans to students. Compared with other nations of the world, the U.S. labor force has been better educated, even after accounting for the lesser achievement of its graduates. In other words, more schooling with less learning each year has yielded more human capital than found in other nations that have less schooling but where students learn more in each of those years.

This historical approach, however, appears on the verge of reaching its limits. Other nations of the world, both developed and developing, have rapidly expanded their schooling systems, and many now surpass the United States. In a comparison of secondary-school completion rates in 1999, the United States trailed a large number of other countries and fell just slightly below the OECD average completion rate. The United States gains some by having rates of college attendance above the typical OECD country. Nonetheless, U.S. students are not likely to complete more schooling than those in a significant number of other developed and developing countries. Thus, going into the future, the United States appears unlikely to continue dominating others in human capital unless it can improve on the quality dimension.

Still, it is not just college completion, but also the quality of the college experience that has been our saving grace. The analysis of growth rates across countries emphasized the quality of the elementary and secondary schools of the United States. It did not include any measures of the quality of U.S. colleges. By most evaluations, U.S. colleges and universities rank at the very top in the world. While there are no direct measures of the quality of colleges across countries, there is indirect evidence. Foreign students by all accounts are not tempted to emigrate to the United States to attend elementary and secondary schools-except perhaps if they see this as a way of gaining entry into the country. They do emigrate in large numbers to attend U.S. colleges and universities, however. They even tend to pay full, unsubsidized tuition at U.S. colleges, which a much smaller share of American citizens does.

A number of the economic models emphasize the importance of scientists and engineers as a key ingredient of growth. By these views, the technically trained college students who contribute to invention and to the development of new products provide a special element in the growth equation. Here again the United States appears to have the best programs. If this view is correct, U.S. higher education may continue to provide a noticeable advantage over other countries.

But the raw material for U.S. colleges is the graduates of our elementary and secondary schools. As has been noted frequently, the lack of preparation of our students necessitates extensive remedial education at the postsecondary level, detracting from the ability of colleges and universities to be most effective. On this count, there is yet another troubling aspect to U.S. academic performance. In international comparisons, U.S. students start out doing well in elementary grades and then fade by the end of high school. Figure 1 shows the slippage that occurs over time in comparison with other countries participating in the TIMSS math and science testing. To the extent that performance at the end of secondary schooling is most important-because it represents the skills students have as they enter college, because it sets the stage for science and engineering skills, or because it is important in its own right for workers in the labor force-schools in the United States are not keeping up with the preparation of students.


Source: Third International Mathematics and Science Study

Just a Jeremiad?

Observers like Bracey, Kohn, and others, dead set against any fundamental changes in the nation’s schools, usually work hard to dismiss the TIMSS results as evidence of any structural weakness in the U.S. education system. At least one objection they raise is interesting enough to consider. If innovation is key to economic growth, the argument goes, then an education system that encourages creativity and questioning rather than drill and memorization may give a nation a competitive advantage. This argument is usually trucked out as a defense against standardized testing, which is supposed to suppress our students’ natural curiosity. Bracey writes, “We should think more than twice before we tinker too much with an educational system that encourages questioning. We won’t benefit from one that idolizes high test scores. It could put our very competitiveness as a nation at risk.” However, none of these critics has ever produced any evidence that creativity is lessened when students improve their math and science skills. Nor do they speak to the costs placed on those individuals who neither reap rewards for exceptional creativity nor have the skills necessary to perform in the modern economy.

Bracey also suggests that we shouldn’t worry about the TIMSS results because they don’t seem to have any effect on our economic competitiveness. He cites as evidence the fact that a nation’s place on the Current Competitiveness Index developed by the World Economic Forum is not perfectly correlated with the TIMSS results, and the United States ranks high on the index. His explanation for our high ranking? All the reasons suggested above: the higher quantity of education in the United States, greater college attendance, retention of our scientists and engineers (while attracting foreign immigrants), and greater innovative capacity. It’s true that all of these factors have probably compensated for our educational deficiencies. But then just imagine how the U.S. economy might perform if its education system was top-notch! And what happens when we lose our advantage in any one of these factors-like when other nations create higher education systems like ours?

In February 1989, in an unprecedented meeting of the nation’s governors with President George H.W. Bush, an ambitious set of goals was set for America’s schools. One goal was that by the year 2000, “U.S. students will be first in the world in mathematics and science achievement.” By 1997, as it became evident that this goal wasn’t going to be met, President Clinton returned in his State of the Union speech to the old model of substituting quantity for quality: “We must make the 13th and 14th years of education-at least two years of college-just as universal in America by the 21st century as a high-school education is today.” It may make a better sound bite simply to offer two more years of schooling, but the best way to cement our competitiveness in the future is to ensure that every student gets a solid education during the first 12 years of school life.

-Eric Hanushek is a senior fellow at the Hoover Institution, Stanford University.

The post The Seeds of Growth appeared first on Education Next.

]]>
49695745
Surface Wounds https://www.educationnext.org/surface-wounds/ Mon, 17 Jul 2006 00:00:00 +0000 http://www.educationnext.org/surface-wounds/ The post Surface Wounds appeared first on Education Next.

]]>


Revolution at the Margins: The Impact of Competition on Urban School Systems

By Frederick M. Hess
Brookings Institution, 2002, $45.95; 268 pages.

As reviewed by Edward B. Fiske

For the most part, the language of economics has informed the public debate over school choice. Free-market economist Milton Friedman was the first to develop the concept of school vouchers, and some of the most enthusiastic supporters of charter schools and vouchers are businessmen who view the delivery of public education through the lens of their own experiences in creating and marketing goods and services. It is thus important to know that Frederick Hess is a political scientist, not an economist. What interests him is the way in which the political and organizational realities of urban schools influence their responses to competition and thus help determine how competition will affect the schools that more than 90 percent of students still attend.

The underlying agendas of choice advocates vary widely. Some view school choice as a social good in and of itself, while others may have indirect objectives, such as funneling public funds to religious schools or privatizing public education. Whatever their agendas, however, most supporters of school choice build their political case on the virtues of competition for public education as a whole. “Perhaps the most commonly advanced argument for school choice,” writes Hess, “is the notion that markets will force the nation’s public schools to improve, particularly in those urban areas where improvement has proved so elusive.” It is this argument-that a rising tide of choice and competition will raise all boats-that Hess explores in this thoughtful volume.

Hess offers a litany of ways in which school systems differ from private enterprises. Whereas a business can focus mainly on serving its customers, public schools serve a multiplicity of stakeholders, of which parents and students are only two. Also, Hess observes that educators tend to be driven by intrinsic motives, such as a “sense of calling,” that lessen the ability of supervisors to force them into a competitive mode. Losing students is not always a threat to public school teachers and managers. In fact, it can be downright attractive when enrollments are rising or when choice relieves them of disgruntled parents or low-performing students. When competition becomes bothersome, it can be handled with political responses, such as running advertising campaigns or creating new high-profile programs, that scarcely relate to educational performance. Moreover, school systems tend to reward the following of rules and procedures, which makes life difficult for those with an entrepreneurial spirit. Finally, education is an “unwieldy market good,” where it is difficult to define quality-or even to know it when you see it.

This discussion is informed by Hess’s case studies of three cities where the conflict of economic theory and political-organizational reality could be observed firsthand. He concedes that these case studies are not intended to be definitive, suggesting that they may be thought of as “theoretically directed journalism” rather than “conventional social-science scholarship.”

Hess devotes two quite thorough chapters to describing the decade-long voucher experiment in Milwaukee. There, choice did not compel the school system to change its routines, but it did lead to a loosening of “bureaucratic procedures and organizational routines” that allowed some educational entrepreneurs to do their thing. Hess calls this the “pickax” response to choice-poking some holes in the system but not making any major changes.

Hess’s second case study focuses on the five-year-old voucher plan in Cleveland, where he finds that the potential benefits of choice and competition were neutralized by multiple factors, including frequent changes in leadership, the state’s move to take over the city’s schools, the modest size of the vouchers (only $2,250), and the existence of strong unions. As a result, he concludes that school officials saw vouchers as “a largely symbolic threat” and felt no need to respond.

Finally, Hess examines the three-year-old Horizon scholarship program in the Edgewood Independent School District of San Antonio. The program offers privately financed scholarships to low-income Hispanic students to attend private schools. Edgewood shows, Hess writes, “how the districts may respond in ways that have little, if anything, to do with educational quality,” including a public-relations campaign, a management study, and the opening of an already planned Fine Arts Academy.

Hess uses these case studies to speculate on how choice might be introduced in ways that both respect the built-in political and organizational constraints of urban school districts and lead to school improvement. “The lesson is not that markets cannot drive more profound change in education,” he writes, “but that such effects will require changing the institutional and organizational context of urban schooling. . . . In short, making competition work as intended will require much more than the simple introduction of market mechanisms.”

Hess questions the pickax approach as a half-measure that cannot compel change or “alter the incentives that drive educational performance.” He likewise questions the workability and desireability of the alternative approach of “unchaining the bulldozer,” or moving to a ruthless competitive system oriented toward coercive accountability and other extrinsic incentives. (I find the metaphor a bit curious, since I have trouble picturing a bulldozer-not a bulldog-on a chain.)

On the matter of starting new schools, Hess suggests that, given the financial and other obstacles involved, the most likely source of choice entrepreneurs is people “operating out of religious obligation or a desire to reap a profit.” This observation fits the current scene. Channeling public funds to parochial schools was reportedly part of the political deal that established the Cleveland voucher plan (an observation that Hess curiously underplays in his narrative), and two-thirds of charter schools in Michigan are now run by education management organizations such as the National Heritage Academies.

Hess states at the outset that he is “not seeking to provide a definitive account of educational markets, but to launch a more useful conversation on the topic,” and he has achieved this goal. Economists have always understood that a variety of factors can interfere with the smooth functioning of markets, and Hess reminds us that the same principle applies to education markets as well.

Hess succeeds in posing a challenge to those who see choice and competition-the manipulation of incentives, if you will-as a way of improving schools without getting bogged down in the nitty-gritty issues of providing a quality education. “In fact,” Hess concludes, “educational competition cannot be divorced from discussions about testing, teacher certification, school district governance, educational administration, or other frustrating conversations that many school choice proponents have long wished to avoid. In the end, the fate of educational markets, for good or ill, is intertwined with broader issues of educational politics and policy.”

Edward B. Fiske, a former education editor at the New York Times, is co-author, with Helen F. Ladd, of When Schools Compete: A Cautionary Tale (Brookings, 2000).

 


 

As reviewed by John Gardner

As a member of the Milwaukee school board, I am one of the officials responsible for the school district’s response to the city’s voucher program. I have helped to manage the changes necessary to make the district’s schools educationally competitive, and I have witnessed the ebb and flow of competitive responses and reactive managerial and political opposition. The Milwaukee school choice program and the response of Milwaukee Public Schools are especially significant in light of Frederick M. Hess’s study of the effects of competition on large urban school districts. One of Hess’s conclusions is that among the three American cities where vouchers have been tried, only Milwaukee has a program with the size, duration, and per-pupil payments to test the hypothesis that market competition can improve schools.

Hess’s other case studies include the state-funded voucher program in Cleveland and a privately funded scholarship program in the Edgewood district of San Antonio, Texas. He argues that the choice programs in Edgewood and Cleveland never grew large enough for the local school districts to really notice any loss of students or funding. The publicly funded Milwaukee Parental Choice Program, by contrast, had grown to almost 7,000 students by the 1998-99 school year, the final year of Hess’s study. This was at least a sign that a substantial number of students and parents were unhappy with the quality of the district’s schools. Yet Hess finds that even in Milwaukee the response of the school district was focused on buffing its image much more than on actually improving the system.

In my opinion, Hess’s only limitation is his stopping point, the end of the 1999 school year. Up until then he has the story right. He argues that the leadership of the Milwaukee schools, the school board and the central office, attempted to respond with systemic and school-based reforms, but were perpetually thwarted by managerial and political constraints that he describes with chilling insight and accuracy. They-we-managed to liberate a handful of schools from managerial, contractual, and bureaucratic constraints that had stifled improvement and reform. Expanding these reforms to all schools was constrained by political and managerial constraints at all levels. The board was unwilling to take on the powerful teacher union in many instances when it most mattered. There were not enough effective, entrepreneurial principals to turn around more schools. And the central office remained unresponsive and territorial, more threatened by innovative schools than supportive of them.

What the study’s stopping point leaves out are the revolutionary changes that occurred from 1999 to 2001. Despite a partial counterrevolution, these changes have been sustained. Schools now control 95 percent of the district’s funding, select their own teachers, and develop curricula to meet district and state standards. School budgets are based almost entirely on student enrollment, providing vigorous rewards and penalties for success or failure to attract students. The district’s central office is reorganizing to serve, rather than control, Milwaukee’s schools, in large part because the central office’s budget is increasingly based on funds that schools choose to allocate to it.

Standards now include annual testing in basic subjects in grades 3 through 8. Students must now pass proficiency exams in order to enter and graduate from high school, replacing the system of social promotion. Incompetent teachers and administrators are now terminated through procedures instituted in, of all places, collective-bargaining agreements.

Before 1999, Milwaukee authorized one charter school, with only 70 students. Since then, the district has authorized 14, with more than 9,000 students attending. The number of previously oversubscribed specialty magnet schools has more than doubled. The number of popular K-8 schools has more than tripled. Before- and after-school childcare and youth recreation programs, once oddities within the system, are now in place in every elementary- and middle-school attendance area.

As a result of these and related changes, Milwaukee public school enrollment, parental satisfaction, funding, and academic achievement have all improved. Some of these changes may well have happened without the competitive challenge of school choice, but even choice’s most vigorous opponents concede that at least some of the district’s responses were produced by the threat of external competition.
Despite missing the revolution, Hess’s analysis remains sound and moves the voucher debate helpfully away from the rigidities of the state-vs.-market debate. Education markets in large urban settings, he argues, differ so much from market analysis that one must question whether they are, in fact, even markets.

Hess’s dispassionate, balanced analysis will disappoint both opponents and advocates of market-based education reforms. Hess is hardly a defender of urban public education’s failures. His previous book, Spinning Wheels, documented with depressing consistency the cycle by which urban districts adopt reforms, begin implementing them, encounter roadblocks, abandon reforms before their failure becomes apparent, and develop new reform agendas before the deadline to evaluate previous failures. In some districts the cycle has become a ritual procession of superintendent firings and curriculum changes, endlessly deflecting attention from managerial competence, school performance, and public accountability.

At the same time, Revolution at the Margins will offer little comfort to promoters of market-based education reforms who believe that markets, by themselves, are the answer. Milwaukee demonstrates that markets alone did little to change the public schools. It was the combination of a new market environment and effective responses from the public schools that simultaneously expanded choices for poor families and improved both choices and performance within the Milwaukee public schools.

What remains constant, whether urban districts respond effectively or respond at all, are the constraints and skewed semi-market within which they operate. School districts are, after all, governments, not profit-making enterprises. Like all governments, they respond to political pressures and public opinion more than to market signals and competitive incentives. School board members, central administrators, school principals, and teachers rarely have any significant stake in whether their schools lose some students. Even teachers, in some ways the most vulnerable to perceived or actual reductions in funding, are buffered from market pressures because of the national shortage of teachers, especially in major cities.

School officials and employees are, however, intensely interested in the funds that state legislators and municipal governments authorize. Rather than take on the difficult and challenging work of winning loyalty and enrollment from low-income parents, school districts face much more powerful incentives to fight legislative, legal, and public-relations wars. These offer the simultaneous advantages of familiarity, expectation, and shifting the blame for failure elsewhere.

Also diluting the competitive responses of school districts are the limitations placed on the choice programs themselves. Capping the number of students who can participate in choice programs has kept school districts from suffering any severe drops in enrollment. Similarly, in all three cities it was never clear that enough money left the school districts as a result of student defections to generate any significant economic distress among central administrators, principals, or teachers.

Finally, there is the thorny issue of what the goal of competition is. Do schools exist to teach specific skills or to serve a broad range of social functions, including socialization, assimilation, acculturation, and the provision of a panoply of community services? Has a school successfully responded to competition if it drops its arts program in order to raise test scores in math? In this context, it’s hard to tell who won the competition, or if the competition resulted in better outcomes for everyone.

Hess’s most important contribution is clarifying and redefining the debate. School choice will ultimately prevail or disappear based on how it affects entire urban populations, not just the small group of students who benefit directly from being able to attend private schools tuition-free. Like all markets, public education operates within a framework of governmental mandates, regulations, and finance, both direct and indirect. Like all governments, public education ultimately depends on private market choices and very public political decisions. Hess persuasively shows that if markets are to have significant and positive effects on public education in major American cities, two difficult political revolutions must occur. First, markets must be created intentionally, not simply by introducing external competition, but by changing the internal rules and operations of the public district. Second, districts must develop the political will and managerial competence to respond-transformations that will, in most American districts, require their own revolutions.

John Gardner is an at-large member of the Milwaukee school board. He was elected in 1995 and reelected in 1999.

The post Surface Wounds appeared first on Education Next.

]]>
49695860
Accountability Gains https://www.educationnext.org/accountability-gains/ Mon, 17 Jul 2006 00:00:00 +0000 http://www.educationnext.org/accountability-gains/ Are we measuring achievement gains accurately enough?

The post Accountability Gains appeared first on Education Next.

]]>
Will value-added analysis make performance-based evaluations a viable tool for school improvement?

The Problem

Family background decisively shapes student achievement. As a result, simplistic accountability systems that dole out rewards or sanctions based on test scores run the risk of punishing schools and teachers for problems beyond their control.

The Solution, Perhaps

Attempt to measure the achievement gains that a school or teacher elicits by subtracting their latest test scores from the previous year’s. These gains are less susceptible to the home influences than the simple level of achievement at one point in time. Thus, by measuring gains, we can pinpoint the “value” that a school has “added” to its students’ educational experience.

The Debate

Value-added analysis, while a promising innovation, suffers from various statistical shortcomings. Errors in measurement could lead to schools’ and teachers’ being rewarded or sanctioned wrongfully. The question is whether the benefits of value-added analysis outweigh the flaws.

The post Accountability Gains appeared first on Education Next.

]]>
49695814
In the Shadow of Terror https://www.educationnext.org/in-the-shadow-of-terror/ Mon, 17 Jul 2006 00:00:00 +0000 http://www.educationnext.org/in-the-shadow-of-terror/ Life returns to not quite normal at Stuyvesant High

The post In the Shadow of Terror appeared first on Education Next.

]]>

It’s a bitter-cold morning in New York City as I cross the Stuyvesant Bridge–known to the rest of the world as the TriBeCa Bridge–and hurry toward the school doors, eager to escape the knife-sharp wind. On the threshold, though, I’m stopped mid-step.

“ID card on!” bellows a security guard I’ve known since I was a freshman. I fumble in my bag, acutely conscious of how late I’m going to be for gym and of how much I detest wearing the blue card. A man who has simply nodded at me every morning of my four years at Stuyvesant High School now needs to see identification before letting me in the school.

Things have definitely changed around here.

***

Weeks and months have passed. The nightmare that was the morning of September 11 has faded into the background as much as it possibly could have. Stuyvesant High has, accordingly, returned to normal–in most ways, but certainly not all.

It starts with our morning walk down Chambers Street, when we see, down the avenues, the smoke-cloaked masses of debris. During lunch periods, students now are often chased out of the hallways, the gathering places we’ve always taken for granted. The guards are slowly beginning to relax, but we’re still careful about where to sit. We simmered over the fact that until February 1 we weren’t allowed to leave the building for lunch, a pre-September 11 privilege that we regarded as a birthright.

Those are just the daily, trivial hassles. The important changes show themselves in social studies classes, where it now takes only the barest spark to begin heated arguments over the United States’s forays in the Middle East. Sometimes, a pacifistic argument can be shut down single-handedly with an angry gesture toward the window–in the direction of the still massive mountains of twisted steel and shattered concrete that used to be the World Trade Center. “How can we possibly not retaliate,” a classmate once snapped, “when there are still 3,000 bodies lying in that pile?” The day’s lesson is invariably put on hold whenever the subject of September 11 is brought up. The teachers need the time to teach–even more than usual, as a result of the time that was lost right after the attacks–but they understand that we sometimes need the time to discuss our thoughts and emotions. Whenever anyone hears the noise of a plane nearby, there is a palpable moment of tension. That will be, I think, a hard thing to overcome.

Conflicting reports on air quality come from the Environmental Protection Agency, private air testers, and the city’s newspapers. Air-collecting stations are set up around the school; the machinery emits whirring, gurgling noises at all hours. Air testers pace the hallways. When we ask how the air in our classrooms is, they often don’t reply. The mystery makes the situation considerably more frightening. A classmate remarked caustically during a discussion once, “What exactly doesn’t cause cancer?” Perhaps it is a mark of the times that her comment made everyone grin.

***

Nevertheless, Stuy is still Stuy. Seniors are receiving their acceptance letters; the hallways are a daily scene of impromptu celebrations and rounds of hugs. Juniors are chugging through junior grind and revving up for their AP exams. The Spectator is still printing regularly, sports teams are still playing (even winning sometimes), and friends are still gathering after school in pizzerias. Everyone here was cut to the heart over the events of that terrifying fourth day of school, but we have all helped one another to go on. We share a bond that I believe no other class at Stuyvesant ever has–a bond forged in horror and tragedy, but strengthened by indelible spirit and an immortal sense of togetherness.

-Laura Krug is a senior at Stuyvesant High School in Lower Manhattan and a news editor of the Stuyvesant Spectator.

The post In the Shadow of Terror appeared first on Education Next.

]]>
49695862
Sizing Up Test Scores https://www.educationnext.org/sizing-up-test-scores/ Mon, 17 Jul 2006 00:00:00 +0000 http://www.educationnext.org/sizing-up-test-scores/ The latest innovation in measuring the performance of schools and teachers holds great promise, but the idea is still way ahead of our ability to execute it

The post Sizing Up Test Scores appeared first on Education Next.

]]>
Illustrations by John Berry.


One of the basic critiques of using test scores for accountability purposes has always been that simple averages, except in rare circumstances, don’t tell us much about the quality of a given school or teacher. The high scores of students in a wealthy suburban New Jersey school will reflect the contributions of well-educated parents, a communal emphasis on academic achievement, a stable learning environment at home, and enriching extracurricular opportunities. Likewise, the low scores of students in an inner-city Newark school will reflect the disadvantages of growing up poor. The urban school might have stronger leadership and a more dedicated teaching staff, yet still score substantially lower than the suburban school. As a result, in the past decade researchers have grown interested in ways of measuring and comparing the gains in academic achievement that a school or teacher elicits–in other words, a school or teacher’s “value added.” Say, for instance, that a school lifts its students from the 35th percentile on national tests to the 50th percentile. An accountability system that uses value-added assessment might judge this school more effective than a school whose students consistently score at the 60th percentile. A value-added system might also identify a school’s best and worst teachers by tracking their students’ gains in the course of a year. The prospect of measuring the contribution made by schools and teachers to their students’ progress is winning a growing number of converts to value-added assessment. However, some practical complications stand in the way.

It is important, first, to distinguish between assessment for diagnostic purposes and assessment as a mechanism of accountability. Value-added assessment has demonstrated its value in the former capacity. Pioneering work in Dallas and in Tennessee has shown that value-added assessment provides information that can be useful when viewed in context by educators who understand local circumstances.

The more serious difficulties arise when value-added assessments are used to hold schools and teachers accountable, with high-stakes personnel decisions to follow. The danger is that such assessments will be used to supplant local decisionmaking, rather than to inform it. Unfortunately, our instruments of assessment are not precise or dependable enough for this purpose. I will discuss three problems: 1) current methods of testing don’t measure gains very accurately; 2) some of the gains may be attributable to factors other than the quality of a given school or teacher; and 3) we lack a firm basis for comparing gains of students of different levels of ability.

Measured gains are noisy and unstable.
Tests are not perfect measures of student ability or achievement. A student’s performance on any given test will be due partly to true ability and partly to random influences (distractions during the test, the student’s emotional state that day, a fortuitous selection of test items, and so on). This test “error” causes problems enough when we attempt to assess a student’s level of achievement. The problems are significantly compounded when we take it a step further, to measuring achievement gains. A gain score is the difference between two test scores, each of which is subject to measurement error. The measurement errors on the two tests, taken months apart from each other, are unlikely to be related (after all, these are random influences). When we subtract one score from another, the measurement errors do not cancel out. However, a student’s true ability does not change that much from one test occasion to another. When we subtract one score from another, a good deal of the portion of the scores that represents true ability will cancel out. The result: the proportion of a gain score that represents measurement error is magnified vis-à-vis the initial scores. In statistical parlance, gain scores are much noisier than level scores.

Statisticians facing this problem have adopted procedures that adjust raw measures of gain to minimize the contribution of statistical noise. The noisier the data, the less weight is placed on measured gains for any one school (or teacher). In extreme cases, the school or teacher in question is simply assigned the average level of effectiveness. Of course, the amount of noise in the data is itself something that must be estimated. As a result, these statistical methods are quite sophisticated. Virtually no one who is evaluated by these methods–teachers or administrators–will understand them. Thus, value-added systems that adjust for the unreliability of raw test scores will fail one of the criteria that educators have deemed important for accountability: that they be transparent. Measured performance (as determined by the statistical models) will not accord with the raw data. It will be impossible to explain to the satisfaction of educators why two schools (or teachers) with similar achievement gains nonetheless received different ratings of their effectiveness.

Moreover, inequities will arise simply because measured gains are more dependable for schools and teachers for whom there are more data. There will not be enough information about teachers who are new to a school system to obtain reliable estimates of their effectiveness based on past performance–they will simply be deemed “average.” Likewise, it will be considerably more difficult for a small school to rank high: it will have to outperform larger schools in order to appear equally effective. (In the same way, a small school’s inferior performance may go undetected.) Discrepancies will also arise across subjects. For reasons probably due to the home environment, more of the variation in student reading performance is independent of school quality than is the case in math performance. As a result, it is harder to detect particularly strong (or weak) performance by reading instructors than by math teachers.

In the end, using sophisticated methods of value-added assessment may not be worth the trouble if the object is to identify and reward high performance. William Sanders, formerly of the University of Tennessee and now at the SAS Institute, has done pioneering work to develop a system of value-added assessment, using the results of annual tests administered to all elementary and middle-school students in Tennessee. The great majority of teachers assessed by this system do not differ from the average at conventional levels of statistical significance. A recent investigation of achievement in one large Tennessee school district (in which I am collaborating with Sanders and Paul Wright of the SAS Institute) has found that 20 percent of math teachers are recognizably better or worse than average by a conventional statistical criterion. By the same criterion, the percentage falls to 10 percent in language arts instruction and to about 5 percent among reading teachers. Those who want to reward teachers on the basis of measured performance should consider whether it is worth the trouble and expense to implement value-added assessment if the only outcome is to reward small numbers of teachers. Of course, it is possible to disregard statistical criteria and reward the top 10 percent of teachers in all subjects willy-nilly. But then many rewards will be made on the basis of random fluctuations in the data.

Gain scores may be influenced by factors other than school quality.
Value-added assessment has one signal merit: it is based on student progress, not on the level of achievement. Schools and teachers are accountable for how much students gain in achievement. They are not given credit for students entering at a high level or penalized when their students start far behind. In effect, value-added assessment “controls for” the influence of family income, ethnicity, and other circumstances on students’ initial level of achievement.

However, this may not be enough. The same factors may influence not just the starting level, but also the rate of progress. Thus, even in value-added assessment, it may be necessary to control explicitly for these factors (or demonstrate that they do not matter).

The socioeconomic and demographic factors that might influence student progress make a long list. In practice it is unlikely that an assessment system will have access to data on student backgrounds beyond what is routinely collected by school systems: the percentage of students with limited English proficiency, the percentage eligible for free and reduced-price lunch, and the ethnic and racial composition of the student population. Clearly other factors also matter. Critics of high-stakes assessments will object that without an exhaustive set of controls, the assessment system will end up penalizing some teachers and schools for circumstances beyond their control. Unless it can be shown that value-added assessment need not account for these other influences, schools that receive low marks will have an obvious excuse: the assessment did not recognize that “our students are harder to educate.”

Some progress has been made in this task. Sanders, Wright, and I have found that introducing controls for students’ race, eligibility for free and reduced-price lunch, and gender (together with the percentage of a teacher’s students eligible for free and reduced-price lunch) usually has only a minor impact on teachers’ measured effectiveness. However, this study was limited to one school district and one series of achievement tests. Whether the results will generalize remains to be seen.

Moreover, even small differences in measured effectiveness can have practical consequences for schools and teachers, depending on how these assessments are used. For example, suppose it is school policy to reward teachers who score in the top 10 percent. Whether a specific teacher falls into this category can be rather sensitive to the inclusion (or omission) of controls for student background. Even relatively modest changes in measured effectiveness, such as our research has found, can have a decisive influence on whether a teacher falls above or below a cut-off point defined in this manner. A teacher who would rate in the top 10 percent on one measure has only to fall slightly below the cut-off on the other measure to drop out of the category of teachers who are recognized for their excellence. Our findings suggest that this will happen with some frequency: more than one-third of the teachers who ranked in the top 10 percent when our assessments included socioeconomic and demographic controls no longer belonged to that category when these controls were omitted from the analysis.

Similar gain scores are not necessarily comparable.
To practice value-added assessment, we must be able to compare the achievement gains of different students in a meaningful way. We need to be assured that the scale on which we measure achievement is one of equal units: one student’s five-point increase on an achievement test, from 15 to 20, must represent the same gain as another student’s five-point increase from 25 to 30 (see Figure 1). If it does not, we will end up drawing false conclusions about the relative effectiveness of these students’ teachers and schools.

Mathematicians who specialize in measurement in the social sciences, together with experts in the construction and interpretation of tests–psychometricians–have devoted considerable attention to this matter. Their findings are highly unfavorable to value-added assessment. First, it is clear that a simple tally of how many questions a student answered correctly will not have the desired property. Test questions are generally not of equal difficulty. Raising one’s score from 15 to 20 might well represent a different achievement gain than an increase from 25 to 30, depending simply on the difficulty of the additional questions that have been answered. This objection also applies to several popular methods of standardizing raw test scores that fail to account sufficiently for differences in test items–methods like recentering and rescaling to convert scores to a bell-shaped curve, or converting to grade-level equivalents by comparing outcomes with the scores of same-grade students in a nationally representative sample.

In the 1950s, psychometricians began to deal with this issue in a systematic way. The result has been the development of “item response theory,” which is used to score the best-known and most widely administered achievement tests today, such as the CBT/McGraw-Hill Terra Nova series and the National Assessment of Educational Progress. In item-response theory, the probability that a student will answer a given item correctly is assumed to depend on the student’s ability and on the difficulty of the item, as expressed in a mathematical formula. Neither a student’s ability nor the difficulty of the item can be directly observed, but both can be inferred from the pattern of answers given by a particular student as well as by other students taking the same test. For example, as one would expect, the more students who answer a given item correctly, the “easier” the item is judged to be. The estimate of a student’s ability (known as the scaled score) is expressed on the same scale as item difficulty.

The critical question, given that neither ability nor item difficulty can be measured directly, is whether the procedures of inference are powerful enough to put the resulting ratings of ability and difficulty on equal-unit scales. Has student A, whose scaled score rose from 500 to 600 (using item-response theory methods), truly learned less than student B, whose scaled score rose from 300 to 450? Does 100 points at one range of the scale really represent less learning than 150 points at another point on the scale? The fact that we express both scores numerically predisposes us to answer affirmatively. The dubiousness of such a response can be appreciated by approaching the question from another angle, taking advantage of the fact that ability is measured on the same scale as item difficulty. Suppose a student has answered test item A correctly, which has a difficulty rating of 500. Item B is harder, with a difficulty of 600. Clearly the student needs to know more to answer question B than question A. But is the extra knowledge required to answer question B truly less than the extra knowledge required to answer item C, with measured difficulty of 450, compared to item D, with a difficulty of 300? Does a numerical difference of 100 between items A and B really mean that the latter item “contains” 100 more units of something called difficulty–and that these are the same units in which the difference between items C and D is judged to be 150? Do we possess such a scale for difficulty, or are we merely able to determine the order of difficulty, assigning higher numbers to items judged to be harder?

Generally speaking, the latter account of the matter is the correct one. And because ability is measured on the same scale as difficulty, the same holds true of it. In practice, psychometricians usually act as if ability scores are on an equal-unit scale (or, in technical terms, an “interval” scale). But this is merely an assumption of convenience. As prominent psychometricians have pointed out, many of the usual procedures for comparing achievement gains yield meaningless results if the ability scales lack this property.

In the previous example, the size of the intervals, 100 and 150, depended on the choice of the mathematical function expressing the probability of a correct response. Different choices for that function will produce different scales. And choice is just what takes place. There is no “correct” choice–or, more precisely, we possess no criteria for determining whether one choice is more correct than another. Statistical properties, such as the “fit” of the data to the model, are of no help here. As noted in a 1986 article in the Journal of Educational Measurement by Wendy Yen, formerly the chief research psychologist with CBT/McGraw-Hill and now with the Educational Testing Service (ETS), there are infinitely many nonlinear transformations of the ability scale that will fit the data equally well, yielding the same probabilities. Yet these transformations will shrink the scale over some ranges and expand it over others, so that student A appears to make more progress than student B using one scale, but less using another.

These conclusions cut the ground out from under value-added assessment. Our efforts to determine which students gain more than others–and thus which teachers and schools are more effective–turn out to depend on conventions (arbitrary choices) that make some educators look better than others. This does not mean that testing is of no value. It is still possible to rank students’ performance and to ask how many students exceed a specified benchmark. But the finer kinds of measurement required to compare the progress of students at different levels of initial ability exceed the capacities of our instruments. As Henry Braun of ETS wrote in a 1988 article for the Journal of Educational Measurement: “We have to be very careful about the questions we ask and very sensitive to the possibilities for obtaining misleading answers to those questions. . . . In particular, we should probably give up trying to compare gains at different places on the scale for a given population.”

Conclusion

There is a simple idea behind value-added assessment: schools and teachers should be evaluated based on student progress. As the foregoing discussion shows, however, successful implementation of this concept is far from simple. It is much harder to measure achievement gains than is commonly supposed.

Notwithstanding these problems, policymakers, educators, and the public will continue to look at indicators of student progress to see how schools are doing. For some purposes, this seems entirely reasonable. Sanders’s assessment system has been a beneficial diagnostic tool in Tennessee. But those who look to value-added assessment as the solution to the problem of educational accountability are likely to be disappointed. There are too many uncertainties and inequities to rely on such measures for high-stakes personnel decisions.

-Dale Ballou is an associate professor of economics at the University of Massachusetts at Amherst.

The post Sizing Up Test Scores appeared first on Education Next.

]]>
49695815
Expert Measures https://www.educationnext.org/expert-measures/ Mon, 17 Jul 2006 00:00:00 +0000 http://www.educationnext.org/expert-measures/ All the evidence to date shows that value-added techniques are being employed responsibly

The post Expert Measures appeared first on Education Next.

]]>
Illustration by John Berry

In one suburban school district, teachers across the system were ranked and evaluated according to the contribution they had made to student learning–based on a value-added analysis of state test results. When they were ranked again the next year, the results were very similar except for one teacher who moved from a top rank to a very low rank. When the school superintendent looked at the results, he immediately identified the teacher–her husband had died during the second year of rankings.

What does this anecdote illustrate? For starters, that value-added assessments can misidentify good and bad teachers–and that the discrepancies can be cleared up by local administrators. More important, however, was the fact that the findings were robust; teachers’ rankings were similar across three years of analysis. Moreover, these rankings were used to grant a standard reward to a top group of teachers, not to make fine distinctions in the amount of compensation that teachers received. In other words, the results of value-added analysis, when examined over a period of two or three years, were stable and were used in a way that respected the margin of error involved in using these statistical techniques.

Critics of value-added assessment tend to embrace the concept but don’t want the results gleaned from such analysis to be used for accountability purposes–and especially don’t want to use the results to reward or sanction teachers. But teachers are the dominant school input, in terms of both spending and impact on student learning. Excluding them essentially leaves the education system without accountability.

The main concern with value-added assessment is that the technique exacerbates the amount of random error involved in measuring student performance. The risk is that teachers and schools may be wrongfully rewarded or punished because value-added techniques either over- or underestimated their students’ learning gains. However, no intelligent users of standardized testing would make policy choices based on a single year’s result or small differences among schools and teachers. Texas, for example, rewards schools based on value-added achievement gains calculated using two-stage regression analysis. The state does not hand out awards based on decimal-point differences among schools; officials reward a previously set percentage of top-ranking schools. The point here is that most of the statistical objections to value-added measurements assume a misuse of the analysis. The statistical “noise” involved in measuring value-added should preclude decisions that are based on small, unreplicated analysis; it should not preclude decisions that are based on gross findings.

Confidence in gross findings can be developed by replication, by averaging results over several time periods, and by using several measures of the development of human capital–not tests alone, but also attendance rates, dropout rates, and promotion rates (a very high-quality assessment will track indicators of human capital such as post-secondary school earnings and higher-education outcomes as well). The richer the measures used, the less weight there is on the psychometric concerns involving test scores. The alternative is to rest teacher compensation on factors that have little to do with student learning. It is now well established, for example, that the number of degrees teachers possess and the number of hours teachers spend in education courses are unrelated to student learning. Put another way, an important criterion in the determination of a teacher’s salary does not have any bearing on the ability of the teacher to develop human capital. We know that because it has been replicated in many studies in many school districts, even though salary schedules have yet to reflect this information.

Critics often cite the difficulty of comparing the results of large and small schools and comparing one subject with another. It is essential in the debate over the usefulness of the value-added assessment approach that the unit of observation and the comparison group be specified. If, as has been the case in a number of places, the comparison group is all the teachers in a given grade in the school district–with, say, the top 15 percent of the 4th grade teachers receiving an award–what is the significance of a big or small school? The class is the unit, and class size tends to be uniform within a school district. At the high-school level, the comparison group is likely to be, for example, all history teachers or all science teachers across the district. The award-receiving group is a percentage of that comparison group, and it is not affected by the test scores in another subject.

If multiple factors are used for assessment, efforts to check on robustness are made, only extreme performances are rewarded or sanctioned, and comparison groups are selected carefully, the technical psychometric points raised by critics should not swamp the incentive and information benefits of performance-based compensation plans for teachers. If we are ready to determine reading programs, language labs, class sizes, and the use of computerized learning on the basis of value-added assessments, we should be ready to reward teachers using the same techniques.

There is already considerable evidence from several places–such as Tennessee and Florida, where value-added analysis has been used for accountability purposes–that low-achieving students are the main beneficiaries of the changes that occur when these techniques are implemented. When low-achieving students are taught the same body of knowledge over and over again, and when they are taught how to work under a time constraint, they benefit. Value-added assessment techniques reveal that information.

Illustration by John Berry

The Illusion of Transparency

Critics of value-added assessment say that it is just too complicated for teachers and the public to understand the results these systems will generate. In other words, the results will not be transparent. But if transparency is the criterion that trumps all other criteria, we would be compelled to use an assessment method that we know to be wrong. For example, there is no satisfactory way to make judgments about which method of teaching reading is superior–whole language or phonics–without factoring in the socioeconomic, school, and teacher characteristics of each of the groups of students in the experiment. Statistical controls must be used if the assessments of teachers, schools, or programs are to be accurate, even though very few educators understand the statistical principles and methods involved. I do not require a transparent understanding of the efficacy of the flu shot I take, nor do I require a transparent understanding of the operating characteristics of my car; I trust the experts on the techniques. So must it be in educational evaluation.

The problems with the use of value-added assessments, even for teachers, are greatly exaggerated, and the alternatives are simply untenable.

One proposed alternative is the use of one or more methods of subjective evaluation. Other teachers, students, and/or parents can be surveyed to make the judgments. Most of those surveys focus on whether respondents like the teacher, are happy in the classroom, or equally soft attributes. They do not really determine what kind of learning gains a certain teacher is eliciting relative to other teachers. Most people look back at their primary and secondary schooling and identify an extremely demanding teacher (whom they disliked at the time) as the one who made the biggest contribution to their educational development.

A second possibility is to set up a standard–some threshold–of student achievement as an absolute hurdle to define adequate performance. The problems with this are legion. How should the threshold be determined? Those who want to be rewarded set lower thresholds than those who watch budgets. The threshold becomes a major bargaining tool, quite divorced from the informed decisionmaking objective. What incentives for improvement are there for those who crossed the threshold?

Basically, subjective evaluation allows the information essential to the rational allocation of educational resources to be derived politically rather than scientifically.

The state of American public education has been deplored by critics for many years. The notion that student learning has remained somewhat stagnant over the past century, though the country has spent steadily more real resources on education, is a matter of profound concern. This surely indicates that we do not know a lot about what works and what doesn’t.

We need to know if we are to change the pattern. We must have calibrated results; we must use sophisticated statistical methods to interpret the data; we need to use multiple measures of performance; and we need to implement the analysis in ways that are appropriate to the quality of the information. And teachers, the most important school-controlled input into the educational process, cannot be exempt from this information-gathering activity. Our current unhappy results are consistent with the subjective and/or unsophisticated tools we use for assessing effectiveness and creating incentives.

-Anita A. Summers is a professor emeritus of public policy and management at the Wharton School at the University of Pennsylvania.

The post Expert Measures appeared first on Education Next.

]]>
49695819
Enemy of the Good https://www.educationnext.org/enemy-of-the-good/ Mon, 17 Jul 2006 00:00:00 +0000 http://www.educationnext.org/enemy-of-the-good/ No standardized test is perfect. But they're useful nonetheless

The post Enemy of the Good appeared first on Education Next.

]]>
Value-added assessment is flawed, even seriously flawed. Nevertheless, I remain committed to the use of testing and value-added techniques to improve America’s public schools. Voters and policymakers should hold school districts and their employees accountable for student learning. Value-added high-stakes tests, with all their flaws, are an excellent way to do this.

Of course, tests don’t stand alone. To have value they must be aligned with content standards, performance standards, and accountability. And principals and teachers must have the freedom and flexibility to meet those standards, while school districts provide them with adequate resources and support.

High-performance organizations measure almost everything. Why? Because measuring changes behavior. The best way to focus the attention of a workforce on something important is to measure it. As critics points out, measuring the performance of schools and teachers is difficult, very difficult. Does this mean we shouldn’t do it? Does this mean we leave teachers, in the privacy of their classrooms, to set their own standards, develop their own performance measures, and tell us whether the children are learning? We tried this. The result: too many teachers neglected to teach the curriculum or did not teach effectively, and too many children suffered the consequences.

Yes, we should recognize that a test measures true ability and random influences. Yes, measured gains are noisy and unstable. Yes, socioeconomic and demographic factors have some influence on measured student progress. And indeed, scaling is a problem. No standardized test is perfect, yet we use them all the time, and to good effect. Physicians, lawyers, accountants, financial planners, real-estate brokers, and pilots all take high-stakes tests. These tests ensure that professionals have the knowledge necessary to serve the public well.

Teachers know that standardized tests are not perfect measures of what their students have learned, just as they know that the assessments they develop for their own use are not perfect measures. Yet they still use them to diagnose, motivate, and focus classroom learning. And how often are they surprised by a child’s standardized test score? Usually it is just about where they expected it would be.

Standardized high-stakes tests also don’t measure school improvement perfectly, and they shouldn’t be the only accountability device we use. Nor should they be the sole measure of teacher effectiveness. But imperfect as they are, standardized tests do the job. They enable policymakers and the public to answer much more confidently the question, “Are the children learning?” More important, they change behavior.

The constant measurement of student achievement focuses everyone’s attention on student achievement. Superintendents, principals, and teachers now spend more time trying to link the structure and work of the organization to student learning. Discipline creeps back into the organization. Practices that don’t seem to improve student achievement are dropped, and practices that do work spread throughout the organization. Innovation begins to flourish. Student achievement improves.

As a former 12-year school board member for the Houston Independent School District, I have seen this happen in Houston and in Texas. Demanding accountability for results and measuring achievement with the Texas Assessment of Academic Skills (TAAS), a criterion-referenced assessment–actually, a rather blunt instrument–has spurred significant improvement in student achievement. This improvement has been displayed not only on the TAAS; it has shown up in Houston on the Stanford 9 and statewide on the National Assessment of Educational Progress.

Obviously, test data can be given more weight than they deserve, and there is a danger that administrators might use test data inappropriately to make personnel decisions. But there seems to be little evidence that this has happened and a huge body of evidence that high-stakes tests of all kinds focus school systems on teaching and learning. So let the psychometricians continue to improve techniques for value-added assessment. We need them to do so. And let economists and statisticians continue to point out the flaws in these tests, so that policymakers don’t misuse them. Meanwhile, policymakers should deepen and broaden their commitment to standards-based reform and high-stakes tests. The benefits greatly outweigh the risks.

-Donald R. McAdams is executive director of the Houston-based Center for Reform of School Systems and the author of Fighting to Save Our Urban Schools . . . and Winning! Lessons from Houston (Teachers College Press, 2000).

The post Enemy of the Good appeared first on Education Next.

]]>
49695823
Vouchers on Trial https://www.educationnext.org/vouchers-on-trial/ Mon, 17 Jul 2006 00:00:00 +0000 http://www.educationnext.org/vouchers-on-trial/ Will the Supreme Court’s decision in Zelman end the debate?

The post Vouchers on Trial appeared first on Education Next.

]]>

Legal experts are already drawing analogies between Zelman v. Simmons-Harris and landmark rulings like Brown v. Board of Education and the seminal First Amendment decisions that have shaped American jurisprudence over the past half century. How the U.S. Supreme Court rules on the constitutionality of Cleveland’s school-voucher program will not only define the legal boundary between church and state more clearly; it could also help redefine the meaning of public education and expand the range of opportunities available to poor children.

Enacted by the Ohio legislature in 1995, the Cleveland Scholarship and Tutoring Program allows 4,000 low-income children to attend private religious and secular schools with up to $2,250 in public support. Participating schools must cap their tuition at $2,500 a year; the state pays up to 90 percent of whatever the school charges, depending on family income. Following a high-profile legal battle, the program was upheld by the Ohio Supreme Court in 1999, prompting opponents to take their case into federal court. On the day before school was to open that year, federal district court judge Solomon Oliver struck down the program, ruling that the use of tax dollars to pay for children to attend religious schools offends the First Amendment’s Establishment Clause. Judge Oliver halted the acceptance of new students to the program while the case was being appealed. Soon thereafter, a sharply divided (5-4) U.S. Supreme Court took an unusual move to vacate Judge Oliver’s injunction and allow the program to continue unaltered while the case is in litigation.

In December 2000, the U.S. Court of Appeals for the 6th Circuit affirmed Judge Oliver’s ruling by a 2-1 vote. The appeals court relied heavily on legal precedents set down by the Supreme Court in 1973 in Committee for Public Education v. Nyquist. At issue in Nyquist was a New York State program that gave low-income parents a partial tuition reimbursement for private-school tuition. The Nyquist Court found that the tuition-grant program had the “impermissible effect of advancing religion.” It concluded that direct or indirect aid to sectarian schools is essentially a government-subsidized incentive to practice religion.

The thinking in Nyquist was remarkable on several counts. Inherent in the incentive concept is the assumption that parochial schools are so superior to public schools that the opportunity to attend the former is irresistible, even to those parents who do not want their children educated in a religious environment. Reasonable people can conclude that the lure of a safe and sound education is an argument for choice rather than against it. Preoccupied with the religious character of parochial schools, the majority also presumed that the court is capable of looking into the minds of legislators to determine their motivations. Using effect to derive intent, the court concluded that incidental aid to religious institutions in the form of tuition relief to parents is tantamount to a purposeful government act to promote religion.

Challengers pointed out that most of the schools involved in the voucher program (46 of 56, accounting for 96 percent of the students) were religious.


There were specific facts pertinent to the Zelman case that the appellate panel deemed relevant to the incentive argument. Challengers pointed out that most of the schools involved in the voucher program (46 of 56, accounting for 96 percent of the students) were religious, leaving few secular options available for participating families. The appeals court accepted this argument even though the Supreme Court had acknowledged similar circumstances in 1983 (Mueller v. Allen) when it upheld a Minnesota program that gave a tax deduction to parents for tuition and other education expenses. While recognizing that most of the deductions were used for parochial school tuition, the Mueller Court found that because parents could deduct expenses for public, private, or religious schools, the deduction was neutral toward religion.

In Zelman, the Ohio attorney general further pointed out that schools participating in the Cleveland voucher program represent only a small portion of the range of choices available outside the regular public schools. In 1999 Cleveland had 23 magnet schools with 13,000 students in attendance and eight charter schools with 1,600 students in attendance, compared with the 3,800 in the voucher program. The two-person majority refused to accept the range-of-choice argument, however, because the magnet and the charter programs were not enacted under the auspices of the voucher law that was being reviewed. Legally speaking, these other choices did not exist. Under the rules of evidence defined by these judges, the same court that could peek into the minds of legislators to determine intent could not recognize hard evidence crucial to determining whether dissatisfied parents at regular public schools had choices beyond religious schools.

Both sides introduced evidence concerning the amount of the voucher. The state of Ohio, in an effort to rebut the incentive argument, explained that the amount ($2,250 maximum) was small in comparison with the per-pupil spending in regular public ($7,746), magnet ($7,746), and charter schools ($4,518). Looking at things strictly from a resource perspective, parents had a disincentive to send their children to schools participating in the voucher program. Opponents argued that since parochial schools were the only nonpublic schools with tuition rates low enough to be covered by the voucher, the program was indeed an incentive to attend these schools. To the extent that the latter argument has merit, the remedy seems obvious: amend the voucher law to make the amount higher, let’s say equal to the per capita amount spent in regular public schools. This would have to be done by the Ohio legislature.

There would be a delicious irony to such a resolution. It would certainly appear equitable, in light of the tortured history of school-finance litigation in Ohio. Carried to its logical conclusion, such a resolution might also add charter schools to the mix of institutions eligible for equal funding. Opponents of choice who raise the funding issue as a means of striking down vouchers would not welcome such a remedy, however. It is not financial equity they seek, but the defeat of the voucher law. And school-finance reformers who have spent oodles of time and money in litigation are not likely to receive such a remedy kindly either. For the most part, their sense of fairness applies only to children in public schools.

What Will the Court Do?

Of course, predicting what the Supreme Court is going to do in a particular case can be more difficult than calling the World Series in the middle of May. We begin with certain general expectations based on past performance, all the while knowing that anything is possible. At least baseball has winners and losers; legal contests are more complicated. To say that the Supreme Court will rule one way or another oversimplifies the process of judicial decisionmaking. Crafted to accommodate the philosophies and styles of the individual justices needed to assemble a majority, legal opinions are written with great nuance. Their outcome depends on what question or questions the judges agree to address, and with what level of specificity. As a rule, the Court tends toward more narrow rulings, with deference paid to precedent. But precedents are rarely consistent; and First Amendment case law is among the most inconsonant.

The First Amendment jurisprudence that has unfolded over the past two decades, however, seems to favor the program in question. In the aforementioned Mueller case, the Court approved a tuition-tax deduction program in Minnesota. More recently, the Rehnquist Court has overturned longstanding precedents in order to allow public school teachers to provide remedial services to children on the premises of religious schools (Agostini v. Felton, 1997) and parochial schools to receive direct aid in the form of computers and other instructional equipment (Mitchell v. Helms, 2000). In 1998, it refused to hear a challenge to a voucher program in Milwaukee that was approved by the Wisconsin Supreme Court.

Still, there is a distinct, though improbable, possibility that the Supreme Court will rule against the voucher program on fundamental First Amendment grounds. This is improbable because since 1986 (Witters v. Washington) the Court has adopted guidelines that allow indirect aid to parochial schools as long as the aid is appropriated neutrally and results from independent decisions by parents who select those schools. These more permissive guidelines were drafted in a concurring opinion by Justice Lewis F. Powell Jr., the author of Nyquist. The original plaintiffs in Zelman argued that parental independence is compromised by an administrative process that sends the voucher check directly to a religious school to be signed by the parent. Again the solution to this problem, if it is really a problem, is rather easy: just send the check to the parent, and let the parent pay the school. The existing procedure was implemented for the sake of administrative expediency. Either way, the money reaches the school because a parent chose that school for her child. But in some courts procedure trumps principle. Legal reasoning would require that the voucher program, operating as it does, be struck down.

Let’s imagine, for a moment, what would happen if it were. In Ohio the issue would be turned over to the state legislature, and once again political irony would be the order of the day. The same people who raised procedure as a point of contention in court would do everything possible to preserve the procedure in the law in order to maintain that the program is void. If choice supporters succeeded in changing the procedure, the issue of constitutionality would be resolved. Realistically speaking, however, the fate of the Cleveland program is unlikely to turn on the payment question. There is a bigger question.

While unlikely, it is conceivable that a majority of the justices hearing Zelman could agree with the argument that providing unrestricted aid to children attending sectarian schools allows the state to endorse, subsidize, and advance religion. The impact of this ruling would be substantial. Not only would it terminate the voucher program for 4,000 children in Cleveland; it would open to challenge the Milwaukee program through which 10,000 low-income students receive up to $5,553 in tuition relief for private and religious schools. Also likely to fall would be the Florida A+ program, which provides up to $3,472 for children who attend chronically failing public schools. It enrolls only a few dozen students in Pensacola, but has the potential to expand statewide. One might say that for all practical purposes, vouchers would be dead. But the same is not necessarily true of school choice.

If the Court strikes down the Ohio law, and by implication those in Wisconsin and Florida, choice supporters will probably pin their hopes on funding schemes involving tax relief, like the tax-deduction plan in Minnesota and less ambitious programs in Illinois and Iowa. By providing a more direct benefit to families, tax-relief programs add a level of separation between the state and the school and are generally deemed to be less vulnerable to legal challenge. Activists will also focus their energies on tax-relief programs for third parties that provide scholarships for poor children to attend religious and private schools. Such programs already exist in Arizona, Florida, and Pennsylvania. In fact, more children (60,000) participate in privately funded voucher programs than in publicly funded programs. Private initiatives such as the Children’s Scholarship Fund and Children First America are likely to grow no matter what the outcome of the Ohio case.

Federally funded Pell grants may be used to pay tuition at religious colleges. Are they a model for school vouchers?


It’s Constitutional, but . . .

Suppose the Supreme Court upholds the Cleveland program. Again the impact of the decision depends on the specific questions the majority chooses to address. Will it deem the distinction between direct and indirect aid significant? Will it finally dispose of the legal quibbling over form versus substance regarding the payment procedure? Will it adopt the broader standard of neutrality suggested by Justice Clarence Thomas in Mitchell v. Helms? Under this criterion, aid is permissible when it “is offered to a broad range of groups or persons without regard to religion” and “results from the genuinely independent and private choices of individual parents.” Or will it decide the case on more narrow grounds? It is quite possible for the case to result in a split majority, as in Helms, where four justices accepted the neutrality standard, while Justices Sandra Day O’Connor and Stephen Breyer permitted the aid on the basis of more narrowly defined criteria. Either way there is bound to be another round of litigation.

One possible site for a new legal battle is Maine. Maine has a 130-year-old voucher law that once allowed children living in towns without high schools to attend private or parochial schools with state support. In 1981 the law was changed to exclude religious schools. The shift in policy was upheld by the Maine Supreme Court on First Amendment grounds in 1999 and subsequently confirmed by a federal appellate court, with the Supreme Court refusing to hear an appeal. If the U.S. Supreme Court approves the Cleveland program, choice supporters in Maine could have their case reheard to reinstate the eligibility of religious schools.

It is no accident that most of the recent legal challenges to existing voucher laws began in the state courts. Opponents have based their litigation strategies on “Blaine Amendment” provisions in state constitutions that set a more rigid standard for church-state separation than that enshrined in the First Amendment-at least as the U.S. Supreme Court has interpreted it over the past two decades. Blaine amendments, a remnant of the 19th-century battles over public aid to parochial schools, are named for James G. Blaine, a presidential aspirant and congressman from Maine, who in 1875 tried unsuccessfully to enact a federal constitutional amendment prohibiting such aid. Although Blaine failed to assemble the supermajority of votes needed to pass a federal constitutional amendment, his proposal became a model for state legislators who shared his separationist and anti-Catholic sentiments, then widespread in the nation. By the end of the 19th century, 29 states had written similar amendments into their constitutions. These provisions, as well as others that were added later, could have a major impact on the future viability of voucher programs-but not without opening the door to yet another wave of litigation. Although opponents lost their challenges in the state courts of Ohio and Wisconsin, they prevailed in Vermont, which, like Maine, has a century-old voucher law that disqualified religious schools from participation in 1995. On appeal the Supreme Court refused to review that case also.

The specific exclusion of religious schools from state voucher programs, as in Vermont, raises federal questions beyond the Establishment Clause. Choice supporters claim that such discriminatory exclusion violates the Fourteenth Amendment’s equal protection clause and the First Amendment’s free exercise clause. Since the state constitutional issue has already been addressed by the Ohio Supreme Court, there is no reason to expect the question to arise in Zelman. But if the U.S. Supreme Court determines that vouchers are allowed under the Establishment Clause, it is only a matter of time before the Court will be asked to settle these larger questions. The same Court that set guidelines for permissible aid in Witters also left the door open for states to set their own standards for church-state separation. While adopting a more accommodating approach to the First Amendment than its predecessor, the Rehnquist Court has also shown a strong sympathy for state prerogatives on matters of federalism. Inevitably, voucher proponents will insist that the constitutional rights secured by a victory in Zelman would prove hollow if the states were permitted to undermine choice on their own. What the Court does to resolve the inevitable clash between these claims and its notion of federalism remains to be seen.

Back to Politics

Even if the Supreme Court were to resolve the federal and state legal questions in favor of vouchers, it would only be setting the stage for the next arena of conflict. Courts only review laws; they do not make them. The most generous judicial interpretation of the voucher question could at most require that states not exclude religious schools from choice programs that are open to other private schools. States would be allowed to continue restricting public funding to government-run public schools, as most do now. Battles over school vouchers have already taken place in more than half the state legislatures, and they will go on.

No doubt a ruling in their favor from the Supreme Court would reinvigorate voucher proponents. It might motivate President George W. Bush-whose solicitor general gave oral argument in Zelman in support of the Cleveland program-to revisit his controversial proposal for federal vouchers. (He has already endorsed tax credits in his current budget proposal.) But the more significant political battles will be fought in the state legislatures, where most education policy is made. Once again the structure of the alliances that form will be filled with political irony. As was so in Cleveland and Milwaukee, the most consistent advocates for school vouchers in America are low-income black and Hispanic parents who live in central cities where the public schools have a history of poor performance. Some black leaders-such as the Reverend Floyd Flake, a former congressman from New York; city councilman (and mayoral candidate) Cory Booker of Newark; and Howard Fuller of the Black Alliance for Educational Options-see choice as a civil-rights issue, a mechanism to provide poor families with the same opportunities enjoyed by the middle class-indeed, as a fulfillment of the promise articulated in Brown v. Board of Education: to make education available to all “on equal terms.”

However, the majority of black and Hispanic political leaders oppose vouchers. Their position is supported by mainline organizations like the NAACP, the American Civil Liberties Union, and the National Urban League, all of which have a long history of advocacy on behalf of disadvantaged populations. While reaching out to fellow Democrats, choice proponents in the minority community have sought to build alliances with Republicans and with libertarian organizations like the Institute for Justice, which has represented poor parents in every voucher case that has come before the courts in the past dozen years, including Zelman. These are not always easy partnerships. People on the left side of the political spectrum favor targeted choice aimed at the poor, while those on the right prefer universal vouchers made available to all parents. Nonetheless, these alliances have managed to move choice along in places like Wisconsin, Ohio, and Florida.

The Democratic Party has its own tensions to resolve under an ideological tent that tries to accommodate both old-line labor unions, which instinctively oppose school choice, and a younger generation of black and Hispanic activists who demand it. Party leaders have failed to respond adequately to the question of why poor minority parents should be required to send their children to failing public schools when luminaries like Bill Clinton, Al Gore, and Ted Kennedy saw fit to send their own children to private schools.

To a large extent, the choice cat is already out of the political bag. The development of voucher programs in Wisconsin, Ohio, and Florida has fostered a serious national debate over a question that once could be discussed only on the outer margins of politics. More important, the existence of private voucher programs in nearly every state has introduced poor parents to the idea that there is an alternative to failing inner-city schools, and it is winning more converts every day. That being said, polls indicate that the nation as a whole is at best ambivalent about using tax money to send children to religious schools. And voucher proposals are consistently rejected in popular referendums, as in Michigan and California during the 2000 election, where vouchers were defeated by a 2-1 margin. Furthermore, teacher unions that vehemently oppose vouchers are a powerful force within most state legislatures, almost assuring rejection in most places. Yes, the political debate is very much alive, but it remains tilted against choice.

The most consistent advocates for school vouchers in America are low-income black and Hispanic parents who live in central cities.


When the Court Speaks

What the Supreme Court says in Zelman could have a marked effect in structuring the terms of the political debate-not just in determining who wins the legal argument, but in explaining its broader implications in a way that only the Supreme Court can. In Zelman, the Court is being asked to weigh two competing political values: strict church-state separation on the one hand and the right of poor families to choose the education their children receive on the other. If the majority settles on a strict interpretation of the Establishment Clause, it will add an air of legitimacy to an already dominant political coalition that opposes school vouchers and other forms of private-school choice. It will raise the wall of separation between church and state to a level it has not seen since the Burger Court.

The political impact of the court’s decision could be even greater if it approves the Cleveland program. At a minimum it would lift the constitutional cover from those political actors who hide behind the First Amendment as a reason to oppose choice. It would lay bare a fundamental struggle over who controls the education of children, the parents or the providers. It also might help to reverse the prevailing political dynamic. Depending on the wording of the opinion, the decision could add a moral dimension to the pleas of poor parents who want educational choices similar to those enjoyed by the middle class. It is difficult to overestimate the power of moral argument in American politics. History has shown it to be an essential ingredient for reversing dominant political patterns in response to demands by weaker parties in pursuit of social justice. The Court provided such a platform in the Brown decision in 1954. While the immediate impact of the ruling was to prohibit de jure school segregation, Brown breathed life into the political struggle that, against all political odds, brought about a revolution in public policy, affecting every branch of government at the federal, state, and local levels.

Whether the Supreme Court perceives school choice as a fulfillment of the promise articulated in Brown remains to be seen. Judicial majorities do tend toward more narrow rulings, except of course when they have something more significant to say beyond the particular legal questions set before them. This may be one of those extraordinary times. If so, such an opinion would be especially compelling coming from the pen of Justice Thomas, the lone black member of the Court, who has written for the majority in a number of important cases involving religion and education. It would remind the nation that there is more than one voice in the black community, each driven by a vision of educational equality, following different paths to get there.

But even a more narrow decision may prove to be pathbreaking. After all, Brown itself was a cautious decision, declaring unconstitutional only segregation in schools, not segregation at train stations, parks, or other public facilities. And Brown only required the states to implement school desegregation “with all deliberate speed,” something less than a clarion call for immediately rectifying the effects of racial injustice. Yet 50 years later, the spirit of Brown is vastly more important than its wording. So it may be with Zelman.

Joseph P. Viteritti is a research professor of public policy at New York University and the author of Choosing Equality: School Choice, the Constitution, and Civil Society (Brookings).

The post Vouchers on Trial appeared first on Education Next.

]]>
49695824
The Supreme School Board https://www.educationnext.org/the-supreme-school-board/ Mon, 17 Jul 2006 00:00:00 +0000 http://www.educationnext.org/the-supreme-school-board/ Vouchers on Trial: A view from inside the courtroom

The post The Supreme School Board appeared first on Education Next.

]]>

The waiting line to hear oral argument before the U.S. Supreme Court formed the night before February 20. Anyone joining after 5 A.M. never got in-except those given special seating, including such notables as Senator Edward Kennedy, Health and Human Services secretary Tommy Thompson, and former White House counsel C. Boyden Gray. It was well worth the wait. Persistent questioning, passionate debate, direct self-contradictions, an electric atmosphere-all were there. As the 80-minute conversation came to an end, a pro-voucher resolution seemed to have just barely emerged, the outcome turning as much on educational facts as constitutional questions.

The Court seemed as much a national school board as an interpreter of the Constitution’s Establishment Clause. Questions seldom focused on past jurisprudence-probably because earlier decisions have constructed a wall of separation between church and state as serpentine as the one Thomas Jefferson designed for the University of Virginia’s campus. Instead, the day’s focus was on vouchers, charter schools, and the woeful state of public education in Cleveland. The justices seemed to realize that they were discussing the future of low-income, inner-city children, not just fine points of legal doctrine.

It was Justice David Souter who first posed the central question to Ohio assistant attorney general Judith French: “Isn’t it true that something like 99 percent of the students who were receiving these vouchers are in religious schools?” Such restricted choice was very different from the “choice from [among] the great universe of colleges and universities,” where federal aid to religious institutions has been generally regarded as constitutional.

To some of the justices, the choices in Cleveland appeared even more restrictive than the 99 percent figure suggests. In their eyes, the high performance of parochial schools relative to the public schools was damning, at least from a constitutional perspective. “The better the parochial school,” said Justice Stephen Breyer, “the less the freedom of choice. . . . If it were my children and I saw these comparisons, I’d say, send them to the parochial school . . . That’s not my religion, but it’s very important my child get the best education, and therefore I would be feeling I had to send them there, if that’s what I want.”

Yet just as it seemed the Court was about to conclude that parochial schools are so good that school choice in Cleveland is meaningless, charter schools-known as community schools in Ohio-were called on to save the pro-voucher argument. In the words of Justice Antonin Scalia, “I assume Justice Breyer could send his child to one of the community schools, which [are] entirely nonsectarian. . . . [These] schools get more money than the sectarian schools.” Scalia also noted that the more established and better funded voucher program in Milwaukee has been attracting secular schools in steadily increasing numbers.

When it came time for the anti-voucher forces to make their defense, Robert Chanin of the National Education Association stepped forward. (Curiously, when veteran reporter Linda Greenhouse’s story appeared in the New York Times the next day, it failed to state Chanin’s NEA connections, identifying him only as the attorney for the Cleveland residents who had challenged the program.)

Chanin’s most difficult task was to show that the community schools in Cleveland were irrelevant because, in Chanin’s view, the justices were legally required to look not at the entire situation in Cleveland but only at the specific statute creating the voucher program. When the attorney restricted his legal vision in this way, he was able to argue, “It is a mathematical certainty that almost all of the [voucher] students end up going to religious schools.”

No sooner were these words enunciated than the most dramatic moment of the morning arrived-a clear, decisive intervention by Justice Sandra Day O’Connor. It was not just what she said-though this was powerful enough-but the fact that O’Connor is expected to cast the decisive vote in this case, as in so many others. For months, even years, it has been evident that the outcome could easily turn on her vote-and the scope of the opinion might well depend on her views. Justices Souter, Ruth Bader Ginsburg, and John Paul Stevens were unlikely to find the law constitutional, and Breyer may have tipped his hand when he observed, rather whimsically, that parochial schools may become increasingly unconstitutional the more they outshine their public-school counterparts. Meanwhile, pro-voucher groups are taking heart from the comments-as well as previous opinions-of Justices Scalia, Clarence Thomas, Anthony Kennedy, and Chief Justice William Rehnquist.

The justices themselves were keenly aware of O’Connor’s decisive position. Both sides made subtle appeals to her. Said Souter: “What’s bothering me . . . and, I suspect, O’Connor, too,” is that the law must be not only neutral on its face but also in its effect, and “at the end of the day, the effect is a massive amount of money [going] into religious schools. . . . That is the sticking point here.”

O’Connor had indicated that the voucher program might resemble a New York State tuition-reimbursement program struck down in the 1973 Nyquist case. Though this observation must have given hope to the NEA attorney, when he began insisting on mathematical certainties, he encountered tough, if patient, resistance: “Well, wait just a minute,” said O’Connor. “Do we not have to look at all of the choices open to the students, the community schools, the magnet schools, et cetera?” Chanin did his best: One must legally ignore all the other schools in Cleveland because “this court has always been program-specific in its financial-aid cases.”

“But I’m not sure that’s proper,” replied O’Connor. “That’s what I’m asking you. Why should we not look at all of the options open to the parents?” Doing so, Chanin argued, “mixed together programs that are quite qualitatively different in both function and purpose.” But, persisted O’Connor, “is it not true that parents can choose to have their children educated in a community school and, if they do, that school gets more money from the State than if they had chosen the religious school? If anything, it’s skewed against the religious schools.” When Chanin iterated, “We now have this year 99.4 percent of the students going to religious schools,” Justice Kennedy, making his own appeal to O’Connor, dropped the acid observation: “So far, you’re doing a very good job of not answering Justice O’Connor’s question.” Laughter temporarily broke the tension in the air.

But within minutes, Chanin was in trouble again: “Supposing there are 10 schools out there, 10 private schools, nine of which are nonreligious and one of which is religious,” imagined Chief Justice Rehnquist. “Is that . . . consistent with the Establishment Clause?” Replied the NEA attorney: “Oh, that’s clearly unconstitutional, Your Honor.” Pressing hard, Rehnquist observed: “The interesting thing . . . your view is, if any one [religious] school gets the money, it’s unconstitutional?” “No, no, your honor,” replied the hapless attorney. “Oh, I thought you said yes,” said the Chief Justice. “No, I’m sorry if I-I did not,” Chanin replied. “Or, I may have, but I didn’t mean to.”

Though the drama then began to subside, the Supreme School Board still had educational points to make. Late in the morning, Justice Kennedy asked: “Is it unconstitutional for [the State of Ohio] to . . . have a structure in which different school systems, different curriculums, curriculums that do not inflict terminal boredom on students, can begin to flourish? And . . . you say they cannot do it?”

Chanin could only reply: “There is no evidence that competition improves the lot for the 96 percent of the students who remain in the troubled Cleveland public school system with less resources and even worse problems.” Even this did not remain unchallenged. “The studies that I’m familiar with say that the inner-city parochial schools, which spend much less per child on education, do a better job than the public schools that spend much more,” said Scalia, adding “so I just don’t think it follows that . . . more money [will] solve the difficulty that the people of Cleveland found with their public schools.”

When told that the only evidence about parochial schools was anecdotal, Scalia added, “Oh, I don’t think it’s anecdotal at all. I mean, there are extensive studies that show the parochial schools do a better job.”

Still, questions posed in oral argument do not necessarily translate into opinions on judgment day. But if the tenor of the discussion on February 20 is an indicator of what is to come, then the future of school choice will take a new twist. If Cleveland’s charter schools are the key to making vouchers constitutional, must future voucher schemes also include a charter component? And must the vouchers be large enough that, as in Milwaukee, they invite increasing participation by secular schools? If vouchers are found constitutional only if charters are available and secular private schools open themselves to voucher recipients, the result could profoundly affect the future of school choice in ways neither side anticipated. The Court may turn out to be the Supreme School Board in deed as well as in the words it voiced in the oral argument.

-Paul E. Peterson is the editor-in-chief of Education Next and co-author of The Education Gap (Brookings, 2002).

The post The Supreme School Board appeared first on Education Next.

]]>
49695832