Standards, Testing, and Accountability - Education Next https://www.educationnext.org/news/standards-testing-and-accountability-news/ A Journal of Opinion and Research About Education Policy Tue, 02 Jul 2024 13:08:48 +0000 en-US hourly 1 https://wordpress.org/?v=6.5.5 https://i0.wp.com/www.educationnext.org/wp-content/uploads/2019/12/e-logo.png?fit=32%2C32&ssl=1 Standards, Testing, and Accountability - Education Next https://www.educationnext.org/news/standards-testing-and-accountability-news/ 32 32 181792879 Tackling “Our Worst Subject” Requires New Approaches—and Better Data https://www.educationnext.org/tackling-our-worst-subject-requires-new-approaches-and-better-data-history-civics/ Tue, 18 Jun 2024 09:00:19 +0000 https://www.educationnext.org/?p=49718182 Infrequent national testing in history and civics, limited state results hamper progress

The post Tackling “Our Worst Subject” Requires New Approaches—and Better Data appeared first on Education Next.

]]>

Image of an American flag on a pole with frayed ends

Chester Finn, president emeritus of the Thomas B. Fordham Institute and a frequent Education Next contributor, likes to recount a story from his time working as a senior official at the U.S. Department of Education under education secretary William Bennett. In 1987, after telling a Chicago journalist that the city’s schools were the worst in the nation, Bennett summoned Finn to his office and asked if he was right. “Well, Chicago has some competition from Newark and St. Louis and Detroit,” Finn replied. “But you weren’t wrong.” Coming well before the advent of widespread statewide testing, much less state- and district-level participation in the National Assessment of Educational Progress, or NAEP, Bennett’s claim seems to have survived contemporaneous efforts at fact-checking.

I often reflected on that exchange during my time working for Senator Lamar Alexander, who was then ranking member of the Senate education committee. In speeches, Alexander had a habit of referring to U.S. history and civics as “our worst subject.”

“Is that right?” he’d occasionally ask when preparing his remarks. Well, I couldn’t say that it was wrong.

According to NAEP, only 14 percent of 8th graders nationwide scored proficient in U.S. history in 2022, while just 22 percent reached that benchmark in civics—both notably lower than the 27 percent and 31 percent who demonstrated proficiency in math and reading, respectively. One might fairly wonder whether the National Assessment Governing Board has set expectations too high in U.S. history and civics, but a glance at item-level results gives ample cause for concern. Just one in three students, for example, could correctly match each of our three branches of government to its core function—a task one in six would get right by answering at random. Whether or not these are our worst subjects, we clearly have a problem.

In this issue, Yale law professor Justin Driver proposes a new way to teach civics that he calls “student-centered civics education” (see “Building Better Citizens Begins in the Classroom,” features). The approach “foregrounds the major Supreme Court decisions that have shaped the everyday lives of students across the nation”—decisions concerning student speech, corporal punishment, religious expression, and more. Its adoption, he argues, would frame students as “active participants in shaping our constitutional order” while also providing a jumping-off point to explore “more-abstract concepts that undergird civic knowledge.”

Driver’s proposal may not appeal to all readers. Some may find it too centered on judicially defined rights, perhaps at the expense of the concomitant responsibilities inherent in citizenship. Others may find its emphasis on student activism too resonant of so-called “action civics,” an approach that often downplays the importance of basic knowledge of how our government operates.

Driver, for his part, would “welcome such disagreements . . . because their existence would indicate that civic education is being actively debated in venues where such debates remain all too rare.” So would I—and I hope his piece provokes ample conversation.

Still, improving civic education will take more than curricular reform. It will also require more and better data on the results produced by competing approaches.

Since Secretary Bennett opined on Chicago’s national standing, our ability to compare student achievement in math and reading across states and school districts has been transformed. Every two years, the NAEP program provides a new set of results for all 50 states and 26 urban school districts—a monitoring system that, though imperfect, enables us to broadly gauge their success (or lack thereof) in developing student literacy and numeracy skills.

In U.S. history and civics, by contrast, NAEP provides a single national data point about every four years. While the program will in 2030 permit states to test enough students in civics to produce state-level results, recent history suggests that fewer than a dozen will embrace that opportunity. Requiring all of them to do so would take Congressional action.

The first record I can find of Senator Alexander using the phrase “our worst subject” is in the title of a 2005 subcommittee hearing on a bill requiring states to participate separately in the NAEP U.S. history and civics tests. Nearly two decades later, we have little reason to believe that his judgment was incorrect. Now would be an apt time for Congress to give civics assessment another look.

— Martin R. West

This article appeared in the Summer 2024 issue of Education Next. Suggested citation format:

West, M.R. (2024). Tackling “Our Worst Subject” Requires New Approaches—and Better Data. Education Next, 24(3), 5.

The post Tackling “Our Worst Subject” Requires New Approaches—and Better Data appeared first on Education Next.

]]>
49718182
Building Better Citizens Begins in the Classroom https://www.educationnext.org/building-better-citizens-begins-in-the-classroom/ Tue, 28 May 2024 09:01:42 +0000 https://www.educationnext.org/?p=49718099 For civics to matter again, students must actively engage with their own constitutional rights

The post Building Better Citizens Begins in the Classroom appeared first on Education Next.

]]>

Illustration

Every December, in a practice that dates back decades, the chief justice of the United States releases a year-end report on the federal judiciary. Despite the New Year’s Eve timing of these reports, they typically elicit less celebration than somnolence. As one veteran journalist who covers the Supreme Court noted with considerable understatement, “The year-end report is usually devoid of anything controversial.”

In 2019, however, with the United States deep in the grip of political polarization, Chief Justice John G. Roberts Jr. issued a year-end report that proved arresting. That unusual document explored the judiciary’s myriad connections to civic education. “By virtue of their judicial responsibilities, judges are necessarily engaged in civic education,” Roberts wrote. “When judges render their judgments through written opinions that explain their reasoning, they advance public understanding of the law.” The Supreme Court’s iconic decision invalidating school segregation in Brown v. Board of Education, Roberts noted, could be viewed through this prism. Chief Justice Earl Warren saw to it that the 1954 opinion would be concise enough—at just 13 pages—to be reprinted in newspapers around the nation. Brown, Roberts wrote, exemplifies “the power of a judicial decision as a teaching tool,” as it provided “every citizen [an opportunity to] understand the Court’s rationale.” Roberts delivered a sobering assessment of the nation’s disregard for democratic ideals and the attendant decline of civic education. “[W]e have come to take democracy for granted,” Roberts lamented, “and civic education has fallen by the wayside.”

Since Roberts issued this cri de coeur in 2019, concerns about democracy and civic education have only intensified. Most prominently, the atrocities committed at the Capitol on January 6, 2021, represented the starkest repudiation of democracy on American soil in decades. One scholar termed that day “a Sputnik moment for an ambitious revival of civics instruction.” As divisions over race, gender, and immigration have deepened, controversies involving civic education have become a salient, persistent topic of national controversy. Five years ago, the New York Times released its 1619 Project, which emphasized the nation’s deep connections to race-based chattel slavery and the ongoing legacy of that odious institution. In response, President Donald Trump formed the 1776 Commission with an eye toward attacking and displacing the 1619 Project’s slavery-based narrative.

These competing projects have been amply debated, and I have no interest in rehearsing those discussions here. I do, however, want to press two observations. First, the 1619 Project and the 1776 Report both portrayed themselves as tools of civic education. Each contemplated how schools could implement the animating ideas of the respective projects, and various educators across the nation have done just that. Second, the competing reports, which dispute the nation’s true origins, embody the profound polarization that afflicts American society. Our two dominant political tribes appear perilously close to singing in unison: “You say 1619. I say 1776. Let’s call the whole thing off.”

It sometimes seems that agreeing to disagree (often angrily) is the only thing that Blue America and Red America can settle on. Yet the nation would be well served by attempting to identify some common ground on the question of civic education. Rather than fighting exclusively about what should not be taught in the nation’s public schools, why not contemplate approaches to civic education that might garner widespread support?

Even in our intensely divided era, there is broad, bipartisan agreement that the current state of civic education is lacking. Not long ago, Senator Chris Coons, a Democrat from Delaware, and Senator John Cornyn, a Republican from Texas, co-sponsored a bill called the “Civics Secures Democracy Act.” That measure, if enacted, would appropriate roughly $6 billion over the course of six years to foster education in civics and history. Supreme Court justices from across the ideological spectrum have also joined forces on this cause. Justices Neil Gorsuch and Sonia Sotomayor, who often disagree in high-profile cases, have made joint appearances touting the need to deepen student comprehension of our basic civic structures. On such occasions, Gorsuch has asserted that the state of civic education poses a national security crisis and noted that political and cultural polarization forms an important part of the crisis: “How can the democracy function if we can’t talk to one another, and if we can’t disagree, kindly, with respect for one another’s differences and different points of view?” For her part, Sotomayor has also dedicated significant time to promoting iCivics, an organization founded and formerly chaired by Justice Sandra Day O’Connor, which seeks to capitalize on youngsters’ fascination with video games to spark their interest in learning about government.

Concerns regarding civic education are well founded; the state of civic comprehension in the United States is—in a word—grim. National Assessment of Educational Progress civics exams conducted in 2022 revealed that less than 25 percent of American 8th graders demonstrated proficiency in the subject. Fewer than one-third of the students could identify why the Founders adopted the Declaration of Independence. The civic knowledge of adults is also lacking. In 2016, one survey determined that only about one in four Americans could name all three branches of government.

In this essay, I aim to amplify and expand on Chief Justice Roberts’s call to connect the judiciary to civic education. I seek to promote an approach that I label “student-centered civic education”—an approach that could find bipartisan support. This method places the historic struggles for students’ constitutional rights front and center in the curriculum. It foregrounds the major Supreme Court decisions that have shaped the everyday lives of students across the nation, but it also uses these decisions as a springboard for discussing the broader issues, arguments, and student activism that fueled those controversies. It is simultaneously retrospective and prospective—teaching students about the hard-fought constitutional struggles that young people waged yesteryear and encouraging them to evaluate critically the contours of their rights in the context of tomorrow’s civic society. A student-centered approach to civic education thus frames students as active participants in shaping our constitutional order and positions them to become engaged, stewards of our democracy.

The storming of the U.S. Capitol on January 6, 2021, has been called “a sputnik moment for an ambitious revival of civics instruction.”
The storming of the U.S. Capitol on January 6, 2021, has been called “a sputnik moment for an ambitious revival of civics instruction.”

Scintillating Questions

The student-centered approach examines the relationships between the people and their government in a way that is tangibly connected to the daily lives of adolescents. High school students tend to view abstract constitutional concepts—such as federalism or the separation of powers—as disconnected from the things that matter most to them. But highlighting constitutional conflicts involving students and the limitations that judicial opinions have placed on school authority hits home for young people. The nation’s 50 million public school students, like most people, will gravitate toward subject matter that immediately informs their own lives.

Cases involving the constitutional rights of students will captivate them as no other civic-education topic can. Should schools be able to force students who participate in extracurricular activities to provide urine samples for drug testing? Should school officials be able to punish students by striking them repeatedly with a two-foot-long wooden paddle? Should they be able to strip-search students in an effort to locate contraband ibuprofen tablets? Should schools be able to exclude unauthorized immigrants? Should schools be able to suspend a cheerleader from the junior-varsity squad for an entire year because she posted a vulgarity on social media—off-campus on a weekend afternoon—to vent her frustration about failing to make varsity? Should high school football coaches be allowed to kneel down in prayer at midfield following games, or do such rituals religiously coerce players? These are among the scintillating questions presented by actual Supreme Court opinions involving constitutional rights in schools. These questions, I submit, would engage even the most jaded of students.

The student-centered approach also drives home the point that young people have made invaluable contributions to our current constitutional order. Sometimes students perceive civic affairs as the exclusive domain of adults. But when students today read about teenagers John Tinker and Mary Beth Tinker wearing black armbands to school in the 1960s over the objections of school authorities in Des Moines, Iowa, they understand that constitutional rights do not materialize out of thin air. The Tinkers dared to protest the Vietnam War on school grounds, endured suspensions, and waged a four-year court battle to make students’ First Amendment rights a reality. Tinker v. Des Moines Independent Community School District demonstrates that young people of prior generations have successfully stood up for constitutional rights and played a pivotal role in creating modern American society. And today’s students may realize that they, too, have an indispensable role to play in bequeathing a constitutional tradition to subsequent generations.

Student-centered civic education also helps young Americans gain deeper understanding and respect for constitutional values at a time when some of those values have come under assault. It is no secret, for example, that many young people today harbor grave skepticism about the First Amendment’s utility. Free expression, critics maintain, is used as either a shield to protect the powerful or a cudgel to bash the powerless. But if students learned early on how young people have harnessed the power of free speech in schools—including not just Tinker’s protection of antiwar speech but other judicial precedents such as one vindicating the ability of civil rights activists in Mississippi to promote racial equality—they would see how the First Amendment often protects minority opinion and protest.

The nation’s universities have in recent years witnessed numerous high-profile conflagrations where students have evinced precious little respect for free speech. Commentators have expressed alarm that our institutions of higher education—where intellectual exchange on contentious topics is supposed to be prized—appear to hold free speech in such low esteem. Too few of those commentators have noted, though, that college students may well disregard freedom of expression partly because they did not meaningfully encounter the concept in elementary or secondary school. Cultivating respect for free-speech values should not be delayed until college. That process needs to start long before then, something that a student-centered civic education would prioritize.

The topics presented in a student-centered civic-education curriculum lend themselves to active debate among students about their constitutional rights in school. After students learn the basics of, say, free speech in schools, teachers should offer novel factual scenarios in mock hearings designed to test the limits of permissible student speech, assigning half of the class to act as lawyers for the student and the other half to act as lawyers for the school board. These mock disputes would encourage students to disagree with each other’s constitutional views respectfully and thereby aid our ailing democratic experiment. If students do not begin learning how to disagree with their peers in the relatively safe school context, disagreements in non-school settings will increasingly escalate into the ad hominem attacks that have become a disconcerting staple of both our politics and our broader culture. Teachers could take this exercise a step further by assigning students to defend a legal position that runs counter to the students’ own viewpoints, requiring them to articulate the most compelling arguments on the other side and helping them to develop empathy for people who disagree with them.

Some of the most significant Supreme Court opinions assessing students’ constitutional rights have emphasized the role of public schools in developing citizens. Students could explore this theme in their coursework. In Brown, for instance, Warren declared that “education is perhaps the most important function of state and local governments. . . . It is the very foundation of good citizenship.” In 1972, when assessing an objection to a compulsory education law, the court wrote that “education is necessary to prepare citizens to participate effectively and intelligently in our open political system if we are to preserve freedom and independence.” In 2021, Justice Stephen Breyer’s opinion for the court in Mahanoy Area School District v. B.L., a case involving off-campus student speech, noted that public schools themselves have an interest in protecting students’ free expression because doing so preserves our democratic order. “America’s public schools are the nurseries of democracy,” Breyer contended. “Our representative democracy only works if we protect the marketplace of ideas.”

The Supreme Court has also embraced a special responsibility for safeguarding constitutional rights in the school context, lest students draw baleful lessons about citizenship. Justice Robert Jackson powerfully expressed this point in 1943, when he led the court’s invalidation of a state measure that required students to salute the American flag in West Virginia State Board of Education v. Barnette. “That [public schools] are educating the young for citizenship is reason for scrupulous protection of Constitutional freedoms of the individual,” Jackson wrote, “if we are not to strangle the free mind at its source and teach youth to discount important principles of our government as mere platitudes.”

In exploring the court’s conceptualization of public schools as institutions that form citizens, students should understand that justices hold divergent views on what citizenship entails, particularly for young people in school settings. Some justices have embraced a robust conception of citizenship for students, suggesting that schools ought to permit wide-ranging, spirited debates on contentious questions. Writing for the court in Tinker, Justice Abe Fortas espoused this robust notion of citizenship. “Any word spoken, in class, in the lunchroom, or on the campus, that deviates from the views of another person may start an argument or cause a disturbance,” Fortas stated. “But our Constitution says we must take this risk, and our history says that it is . . . this kind of openness . . . that is the basis of our national strength and of the independence and vigor of Americans who grow up and live in this relatively permissive, often disputatious, society.”

Other Supreme Court justices have offered a far thinner conception of citizenship for students. They hold that schools should not host freewheeling debates but should instead concentrate on imposing order and discipline on students. Call this competing notion “Report Card Citizenship,” with a nod toward the grade for behavior that some elementary schools once meted out. Justice Hugo Black, dissenting in Tinker, wrote that “school discipline . . . is an integral and important part of training our children to be good citizens—to be better citizens.”

The thin conception of citizenship has seen its stock fluctuate dramatically in Supreme Court opinions since Black’s dissent in Tinker. During the 1980s, the court at times seemed to endorse Report Card Citizenship. In assessing a school district’s ability to punish a high school student for a lewd speech at a school assembly, the court emphasized the school’s duty to “inculcate the habits and manners of civility” and to “teach by example the shared values of a civilized social order.” But the court’s most recent decision involving student speech rebuked Report Card Citizenship. Breyer’s opinion for the court in Mahanoy, like Fortas’s in Tinker, reasoned that schools cannot, without harming our democracy, act as roving censors who punish students for dissident speech. Pupils in student-centered civic-education courses should be encouraged to evaluate critically these competing conceptions of citizenship.

Former Bremerton High School assistant football coach Joe Kennedy takes a knee in front of the U.S. Supreme Court after his legal case, Kennedy vs. Bremerton School District, was argued before the court on April 25, 2022 in Washington, DC.
Joseph Kennedy, a high school football coach who lost his job for repeatedly praying at midfield following games, kneels in prayer in front of the United States Supreme Court building in Washington, D.C. The court found in favor of Kennedy’s free-exercise rights in 2022.

Additional Benefits

As teachers and students together learn about students’ constitutional rights, their awareness will likely help prevent schools from committing certain violations of those rights. A teacher who leads a classroom discussion on Barnette, for instance, will be unlikely to suspend students for refusing to salute the American flag. Such conflicts are distressingly common in American schools, even though Barnette repudiated mandatory flag salutes more than eight decades ago.

Teachers of a student-centered civic curriculum would, moreover, not only help to honor constitutional rights within their own classrooms, but they could also become invaluable resources for an entire school. It seems improbable that busy math and science teachers are going to educate themselves on the minutiae of the Supreme Court’s doctrine governing schools. Yet, when algebra and chemistry teachers confront scenarios touching upon students’ constitutional rights, civics instructors could provide guidance to their colleagues about constitutional protections. These same “in-house experts” could also serve as sounding boards for school administrators contemplating thorny constitutional questions, as it is often impractical to seek advice from school-board attorneys during a hectic school day. These informal consultations could well help increase respect for students’ constitutional rights within the school.

If schools commit fewer violations of students’ rights, they will also mitigate a significant source of political polarization. The nation’s public schools have become a battleground of the modern culture wars, and the media often highlight instances where school authorities have overstepped their constitutional authority. But media organizations have differing views on which violations to highlight, depending on whether these outlets lean left or right. The consumers of these varied, highly clickable reports are left to conclude that the nation’s public schools are systemically attacking their most cherished values, thereby intensifying the partisan divide.

Consider two recent high-profile constitutional controversies that arose when public schools erroneously abridged students’ First Amendment rights—the first involving speech associated with liberals and the second involving speech associated with conservatives. In 2021, two Black elementary school students in Ardmore, Oklahoma, wore T-shirts reading: “Black Lives Matter.” For this seemingly innocuous action, the students were ejected from their classrooms and forced to sit in an administrative office until the end of the day. One school official justified these disciplinary actions by stating that political statements would no longer be permitted at school. The district superintendent suggested that the policy pertained to statements from across the political spectrum: “I don’t want my kids wearing MAGA hats or Trump shirts to school either, because it just creates, in this emotionally charged environment, anxiety and issues that I don’t want our kids to deal with.” After this controversy appeared in the New York Times, the school district updated its policy to prohibit clothing “items [displaying] social or political content.”

The second controversy arose when a high school senior in Franklinton, Louisiana, had his school parking space painted with a portrait of Trump. School policy permitted seniors, for a modest fee, to decorate their spaces, and although the policy prohibited designs that included vulgar language or another student’s name, it did not forbid political statements. Nevertheless, school officials painted over the image, deeming it excessively political. A federal district court judge overrode the school’s decision, holding that it violated Tinker’s foundational protection for student speech. As one might predict, the case received no mention in the New York Times but was trumpeted by Fox News.

These dueling episodes and their attendant coverage—played to quite distinct, but nonetheless equally outraged audiences—further political polarization.

Siblings Mary Beth Tinker and John Tinker protested the Vietnam War in 1965 by wearing black armbands at their Iowa school, a free-speech challenge that went to the Supreme Court.
Siblings Mary Beth Tinker and John Tinker protested the Vietnam War in 1965 by wearing black armbands at their Iowa school, a free-speech challenge that went to the Supreme Court.

Going Further

Studying judicial opinions involving students’ constitutional rights would ideally lay the groundwork for exploring more-abstract concepts that undergird civic knowledge. For example, classroom discussion of Barnette’s prohibition on compulsory flag salutes in school sets up debate on the government’s ability to instill patriotism and to prohibit speech that is regarded as antipatriotic. Students could then consider state and federal legislative efforts to prohibit burning the American flag and the two Supreme Court decisions that invalidated such efforts. Teachers could use that discussion to illustrate concepts such as federalism, separation of powers, congressional authority, and executive authority. Similarly, a classroom discussion about Hazelwood School District v. Kuhlmeier—which held that educators can typically regulate articles appearing in school newspapers without violating the First Amendment—invites a conversation about the media’s central role in maintaining democracy. In addition, analyzing San Antonio Independent School District v. Rodriguez—which declined to invalidate dramatically unequal school-financing schemes—could spur reflection on how well a nation that extols equal opportunity for all lives up to that lofty ideal. Relatedly, Zelman v. Simmons-Harris—which upheld the constitutionality of governments offering students vouchers to attend private, religious schools—opens up a discussion about the Establishment Clause, economic theory, and the desirability of public-private partnerships.

A Presidential Commission?

How can proponents of robust civic education initiate the kind of widespread reform that I have sketched here? One vehicle of change could be a presidential commission on civic education. Many readers may counter that the road to inaction is paved with presidential commissions, and sometimes such criticisms are merited. Yet presidential commissions and their ilk can on occasion crystalize the public’s attention. For example, the renowned report A Nation at Risk served as a significant focal point for education reformers throughout much of the 1980s.

When three brothers from Ardmore, Oklahoma, wore Black Lives Matter shirts to school in 2021, two were disciplined for displaying “political statements.”
When three brothers from Ardmore, Oklahoma, wore Black Lives Matter shirts to school in 2021, two were disciplined for displaying “political statements.”

Numerous private, public, and philanthropic organizations have examined civic education over the years, but these pursuits too often happen in intellectual silos. While these efforts have value on their own, we need—especially today—to find a way to bring them together. A presidential commission examining civic education could provide an excellent occasion for such an assemblage, enabling communities to understand better which approaches work well and which do not. A commission that embraces student-centered civic education should include model lesson plans in an appendix to its report, distilling relevant Supreme Court opinions into portions that are easily digestible for students, offering hypothetical scenarios involving students that are designed to test the limits of those opinions, and providing concrete advice to teachers on how they might spur students to engage with those topics. The commission’s resource materials would ideally provide one-stop shopping for teachers focusing on civic education. Of course, the commission would in no sense aim to mandate that public schools adopt a particular approach. Instead, building on the abundant existing resources in this domain, the commission would devise a model that teachers and local school districts could adopt and adapt. The hope is that school districts and teachers from very different parts of the country would want to implement the framework because it would focus on the relevant topic of students’ constitutional rights and encourage students to actively and critically evaluate the content of those rights.

Forming a commission on civic education could be a sound political idea for a second term of President Joe Biden. In one of his first official moves in January 2021, Biden swiftly rescinded the 1776 Commission Report. The historian Michael Kazin then argued in the New York Times: “Now that the 1776 Commission is deprived of federal authority, its influence will wane more quickly than that of the president who established it.” But just as Trump continues to cast a long shadow over American politics and culture, the 1776 Commission’s Report has not vanished, as its content can easily be accessed via the Internet. Closing our eyes will not, moreover, magically make it disappear. Instead, Biden should assemble a civically minded group from a range of ideological perspectives to offer an affirmative vision of civic education—one that highlights the struggle for students’ constitutional rights. If the president seeks to dislodge the 1776 Report from our intellectual landscape, he must offer his own conception of civic education, and he should frame it, as Gorsuch did, as promoting a vital national security interest.

Prominent Republicans have not shied away from discussing civic education. In May 2020, Steve Bannon, former adviser to President Trump, offered a remarkable statement about future political struggles: “The path to save the nation is very simple—it’s going to go through the school boards.” In the aftermath of the 2020 election, it seems that some right-wing Republicans have embraced what might be termed the “Bannon Playbook” by focusing on education issues. Perhaps the foremost tactic in this political strategy has sought to transform and distort Critical Race Theory into an intellectual bogeyman. Leading figures in the Democratic Party have too often remained silent on these high-profile cultural questions. But it is incumbent upon Democrats, I believe, to provide their own notions of civic education. As the old adage runs, “If you don’t define yourself, someone else will do it for you.”

President Biden has emphasized his desire to locate common ground with Republicans when possible—without sacrificing his core principles. Focusing on students’ constitutional rights as articulated by the Supreme Court—a struggle that dates back to the first half of the 20th century—would enable Biden’s commission to minimize some of the polarizing disputes that have proved insoluble during recent debates. Many Americans understand the profound need to address missing, limited, or ineffective civic education as a way of bolstering our nation’s foundational commitments. In 2018, for instance, one national survey found that the most popular approach to fortifying American democracy was a policy aimed at “ensur[ing] that schools make civic education a bigger part of the curriculum.” To underscore that the commission is truly dedicated to locating commonality on civic education for Americans of different political stripes, Biden should make sure to tap high-profile people associated with the Republican Party to serve. Indeed, he could even consider selecting Chief Justice Roberts to chair, or co-chair, the civic-education commission. If the chief justice should decline, Biden could nonetheless identify Roberts’s year-end report from 2019 as an important inspiration for the group and even title the commission after a passage that Roberts wrote. Near the very end of his report, Roberts stated: “Civic education, like all education, is a continuing enterprise and conversation.” Biden’s Presidential Commission on the Civic Enterprise has a nice ring to it, suggesting that civic education is a collaborative, difficult undertaking that demands considerable effort.

The ideas that I have outlined here are sure to generate disagreement. Some readers may contend that “students’ constitutional rights” is a contradiction in terms. Justice Clarence Thomas has espoused precisely that view regarding student speech, and teachers adopting the student-centered model of civic education should have their own students confront it. Other readers may maintain that the president ought not tread on ground that rightly belongs to states and localities. Still others may find that student-centered civic education places too much attention on judges, courts, and rights at the expense of other material. For my own part, I welcome such disagreements—and many others besides—because their existence would indicate that civic education is being actively debated in venues where such debates remain all too rare.

Chief Justice John Roberts’s 2019 report on the federal judiciary noted judges’ unique role in promoting civic education but lamented how citizens now “take democracy for granted.”
Chief Justice John Roberts’s 2019 report on the federal judiciary noted judges’ unique role in promoting civic education but lamented how citizens now “take democracy for granted.”

Firsthand Experience

My interest in promoting the student-centered model of civic education is not purely theoretical; it is informed by my own experience. On graduating from college in 1997, long before I dreamed of becoming a law professor, I enrolled in a one-year teacher-certification program at Duke University. As part of that program, I had the privilege of teaching a civic-education class to 9th graders at a public school in Durham, North Carolina. I recall witnessing the students—some of whom had displayed minimal interest in analyzing the differences among the three branches of government—come alive when we turned our attention to Tinker. I believe that the students engaged with Tinker deeply because they viewed themselves—at long last—as having some skin in the game. They felt they had genuine expertise about the regulation of students in schools.

Some two decades later, after I joined the faculty at Yale Law School in 2019, I became the faculty adviser for a long-standing program that places law-school students in New Haven’s public schools to teach a student-centered civic-education course. In a small but meaningful way, this program helps bridge the wide chasm that all too often separates elite, cloistered Yale from gritty, under-resourced New Haven. The redoubtable, committed Yale Law students who participate in the program do virtually all of the work, including preparing their students for a citywide oral-argument competition that occurs on Yale’s campus.

I find that visiting those classrooms and seeing student-centered civic education in action is always an inspiring experience. During my first year at Yale, I remember driving early one morning across town to a New Haven public school—one with a virtually all Black and Latino student population, a majority of whom are eligible for free lunch. After passing through the school’s metal detectors, I found my way to the correct classroom, where I witnessed students diligently preparing for their upcoming oral arguments. The students sounded very much like young lawyers, using shorthand for case names to claim that the Supreme Court’s precedents either required (or foreclosed) finding that a hypothetical principal violated a hypothetical student’s First Amendment rights. These students plainly viewed themselves as the subjects of law, not the objects of law, and felt legally and civically empowered. As the students began filing out after class, I overheard one young Black woman say quietly to a classmate, “I want to be a judge when I grow up.” It is my fervent hope that expanding the student-centered model in our schools will inspire more young people around the country to embrace such civically minded ambitions.

Justin Driver is the Robert R. Slaughter Professor of Law at Yale Law School and the author of The Schoolhouse Gate. This essay is drawn from an article that will appear in a NOMOS volume titled Civic Education in Polarized Times, to be published by New York University Press.

This article appeared in the Summer 2024 issue of Education Next. Suggested citation format:

Driver, J. (2024). Building Better Citizens Begins in the Classroom: For civics to matter again, students must actively engage with their own constitutional rights. Education Next, 24(3), 22-31.

The post Building Better Citizens Begins in the Classroom appeared first on Education Next.

]]>
49718099
Two-Sigma Tutoring: Separating Science Fiction from Science Fact https://www.educationnext.org/two-sigma-tutoring-separating-science-fiction-from-science-fact/ Thu, 07 Mar 2024 10:30:14 +0000 https://www.educationnext.org/?p=49717814 An experimental intervention in the 1980s raised certain test scores by two standard deviations. It wasn’t just tutoring, and it’s never been replicated, but it continues to inspire.

The post Two-Sigma Tutoring: Separating Science Fiction from Science Fact appeared first on Education Next.

]]>
Benjamin Bloom’s essay “The 2 Sigma Problem,” featuring his famous hand-drawn Figure 1 showing the supposed immense benefit from one-to-one tutoring, has created believers and skeptics for 40 years. Now with the emergence of generative artificial intelligence, education innovators like Sal Khan of Khan Academy see the potential for AI tutors to fulfill the promise of Bloom’s claim.
Benjamin Bloom’s essay “The 2 Sigma Problem,” featuring his famous hand-drawn Figure 1 showing the supposed immense benefit from one-to-one tutoring, has created believers and skeptics for 40 years. Now with the emergence of generative artificial intelligence, education innovators like Sal Khan of Khan Academy see the potential for AI tutors to fulfill the promise of Bloom’s claim.

In the fall of 1945, when my father was not quite eight years old, his teacher told my grandmother that he was failing 2nd grade. My father doesn’t remember her reasons, or maybe my grandmother never told him, but the teacher felt he wasn’t ready for 2nd-grade work.

“If he’s not succeeding in 2nd grade,” my grandmother suggested, “why not try him in 3rd?” And she found a tutor, a retired teacher from a different school.

For seven weeks, my father met for an hour a day with the tutor, who gave him homework after each session. The tutor’s charge was to make sure my father mastered the curriculum, not just for 2nd grade but for enough of 3rd grade that he could slip into a 3rd-grade classroom in January 1946, a year early, without needing further help.

But the tutor overdid it. Not only did my father encounter nothing in 3rd grade she hadn’t taught him, but he coasted through 4th and 5th grade as well.

Around 1960, while shopping at Filene’s Basement in downtown Boston, my grandmother ran into an old neighbor—a mom who’d moved away when my grandmother was seeking a tutor to help her son escape from 2nd grade. After bragging about her own family, the neighbor asked if my father was all right.

“He’s fine!” said my grandmother triumphantly. “He’s at Oxford, on a Rhodes Scholarship.”

Stories like this give the impression that tutors can work miracles. For centuries after Aristotle tutored Alexander the Great, certain fortunate individuals—including Albert Einstein, Felix Mendelssohn, Agatha Christie, and practically every British monarch before Charles III—were educated partly or entirely by private tutors and family members. While no scholar regrets the spread of mass schooling, many suspect that the instruction students receive from a teacher in a large classroom can never match the personalized instruction that comes from a tutor focused only on their individual needs.

In a 1984 essay, Benjamin Bloom, an educational psychologist at the University of Chicago, asserted that tutoring offered “the best learning conditions we can devise.” Tutors, Bloom claimed, could raise student achievement by two full standard deviations—or, in statistical parlance, two “sigmas.” In Bloom’s view, this extraordinary effect proved that most students were capable of much greater learning than they typically achieved, but most of their potential went untapped because it was impractical to assign an individual tutor to every student. The major challenge facing education, Bloom argued, was to devise more economical interventions that could approach the benefits of tutoring.

Bloom’s article, “The 2 Sigma Problem,” quickly became a classic. Within two years of its publication, other scholars were citing it weekly—50 times a year—and it has only grown in influence over the decades. In the past 10 years, the article has been cited more than 2,000 times (see Figure 1).

Citations to Bloom’s “The 2 Sigma Problem”

The influence of Bloom’s two-sigma essay reached well beyond the scholarly literature. As the computing and telecommunication revolutions advanced, visionaries repeatedly highlighted the potential of technology to answer Bloom’s challenge. Starting in the 1980s, researchers and technologists developed and eventually brought to market “cognitive computer tutors,” which Albert Corbett at Carnegie Mellon University claimed in 2001 were “solving the two sigma problem.” In the 2010s, improvements in two-way video conferencing let students see human tutors at off hours and remote locations, bringing the dream of universal access closer—though there were still simply not enough tutors to go around.

Then, in late 2022, startling improvements in artificial intelligence offered students a way to converse with software in flexible, informal language, without requiring a human tutor on the other end of a phone or video connection. Sal Khan, founder of Khan Academy, highlighted this promise in a May 2023 TedX talk, “The Two Sigma Solution,” which promoted the launch of his AI-driven Khanmigo tutoring software.

Enthusiasm for tutoring has burgeoned since the Covid-19 pandemic. More than two years after schools reopened, average reading scores are still 0.1 standard deviations lower, and math scores are 0.2 standard deviations lower, on average, than they would be if schools had never closed. The persistence of pandemic learning loss can make it look like an insurmountable problem, yet the losses are just a fraction of the two-sigma effect that Bloom claimed tutoring could produce. Could just a little bit of tutoring catch kids up, or even help them get ahead?

Are Two-Sigma Effects Realistic?

But how realistic is it to expect any kind of tutoring—human or AI—to improve student achievement by two standard deviations?

Benjamin Bloom is regarded not only for his tutoring experiment but also his "Bloom's Taxonomy" learning rubric.
Benjamin Bloom is regarded not only for his tutoring experiment but also his “Bloom’s Taxonomy” learning rubric.

Two sigmas is an enormous effect size. As Bloom explained, a two-sigma improvement would take a student from the 50th to the 98th percentile of the achievement distribution. If a tutor could raise, say, SAT scores by that amount, they could turn an average student into a potential Rhodes Scholar.

Two sigmas is more than twice the average test score gap between children who are poor enough to get free school lunches and children who pay full price. If tutors could raise poor children’s test scores by that much, they could not only close the achievement gap but reverse it—taking poor children from lagging far behind their better-off peers to jumping far ahead.

Two sigmas also represents an enormous amount of learning, especially for older students. It represents more than a year’s learning in early elementary school—and something like five years’ learning in middle and high school.

It all sounds great, but if it also sounds a little farfetched to you, you’re not alone. In 2020, Matthew Kraft at Brown University suggested that Bloom’s claim “helped to anchor education researchers’ expectations for unrealistically large effect sizes.” Kraft’s review found that most educational interventions produce effects of 0.1 standard deviations or less. Tutoring can be much more effective than that but rarely approaches two standard deviations.

A 1982 meta-analysis by Peter Cohen, James Kulik, and Chen-Lin Kulik—published two years before Bloom’s essay but cited only half as often—reported that the average effect of tutoring was about 0.33 standard deviations, or 13 percentile points. Among 65 tutoring studies reviewed by the authors, only one (a randomized 1972 dissertation study that tutored 32 students) reported a two-sigma effect. More recently, a 2020 meta-analysis of randomized studies by Andre Nickow, Philip Oreopoulos, and Vincent Quan found that the average effect of tutoring was 0.37 standard deviations, or 14 percentile points—“impressive,” as the authors wrote, but far from two sigmas. Among 96 tutoring studies the authors reviewed, none produced a two-sigma effect.

So where did Bloom get the idea that the characteristic benefit of tutoring was two standard deviations? Was there anything behind Bloom’s two-sigma claim in 1984? Why are we still repeating it 40 years later?

What evidence did Bloom have?

Bloom’s Figure 1—reproduced in Khan’s TEDx talk, among many other places—ostensibly showed the distribution of post-test scores for students who received tutoring, comparing them to students who received conventional whole-group instruction and to students who received a version of what Bloom called “mastery learning,” which combined whole-group instruction with individualized feedback. But the graph was only illustrative—hand-drawn in a smooth, stylized fashion to show what a two-sigma effect might look like. It wasn’t fit to actual data.

Later in the essay, Bloom’s Table 1 compared the effects of different educational interventions. Tutoring appeared at the top of the list, with an effect of 2.00 standard deviations. Below tutoring, the table listed reinforcement learning (1.20 standard deviations), mastery learning (1.00 standard deviation) and a variety of other effects that seem startlingly large by modern standards.

Where did Bloom get these large, curiously round estimates? He claimed that he had adapted them from a paper summarizing early meta-analyses published a month earlier by Herb Walberg, a professor at the University of Illinois at Chicago. But Walberg’s and Bloom’s tables do not entirely agree (see Table 1). Although several of Bloom’s estimates lined up with Walberg’s, at least when rounded, most of the effects in Bloom’s table did not appear in Walberg’s, and most of the effects in Walberg’s table did not appear in Bloom’s. And the two professors definitely did not agree on the effect of tutoring.

Walberg didn’t put tutoring at the top of his list, and he estimated tutoring’s effect to be 0.40 standard deviations—close to the average effects reported in meta-analyses. Bloom did repeat Walberg’s estimate of 0.40 standard deviations, but he described it somewhat narrowly as the effect of “peer and cross-age remedial tutoring.” Walberg’s estimate wasn’t so circumscribed; he described it simply as the effect of tutoring.

Table 1: Bloom's claims on tutoring differ from his key source

Bloom relied on two students

Why did Bloom relabel Walberg’s tutoring effect of 0.40, and where did Bloom get his own estimate of 2.00? It seems Bloom was placing his faith in the dissertation studies of two of his PhD students, Joanne Anania and Arthur J. Burke. Both Anania and Burke reported two-sigma effects when comparing tutoring to whole-group classroom instruction—and substantial effects, though not as large, from mastery learning.

Because Anania and Burke provided essentially all the empirical evidence that backed Bloom’s claim of two-sigma tutoring, it’s a little shocking that Bloom didn’t credit them as coauthors. Bloom did cite his students’ dissertations, but if Burke and Anania had been coauthors on an instant classic like “The 2 Sigma Problem,” they might have gotten jobs that provided the resources to conduct further research on tutoring and mastery learning. Instead, Anania published a journal version of her dissertation research, which has been cited just 77 times to date. She taught at three universities in the Chicago area, where she specialized in reading, children’s literature, and adult literacy. Her 2012 obituary doesn’t mention her work on tutoring. Burke never published his dissertation research—or anything else on tutoring. Years later, he published half a dozen reports for the Northwest Regional Laboratory on suspension, expulsion, and graduation—not tutoring.

Bloom also did little work on tutoring after 1984. His next and last major project was an edited book titled Developing Talent in Young People. Published in 1985, the book relied on interviews with accomplished adults to reconstruct how they had developed their talents for music, sculpture, athletics, mathematics, or science. Bloom, who wrote only the introduction, summarized his two-sigma claim in a single paragraph that did not mention Anania or Burke. Bloom retired in 1991 and died in 1999.

It’s a little odd, isn’t it? If these three individuals—two of them just starting their research careers—really discovered a way to raise students’ test scores by two standard deviations, why didn’t they do more with it? Why didn’t they conduct more research? Why didn’t they start a tutoring company?

The two-sigma effect wasn’t just from tutoring

Did Anania and Burke really find two-sigma effects of tutoring? I must admit I was feeling skeptical when I printed out their dissertations. Few 40-year-old education findings hold up well, and student work, half of it unpublished, whose effects have never been replicated, seemed especially unpromising.

Book cover of Developing Talent in Young People
Bloom mentions his two-sigma claim in his last book project.

To my surprise, though, I found a lot to like in Anania’s and Burke’s dissertations. Both students ran small but nicely designed experiments to test the effect of a thoughtful educational intervention. They randomly assigned 4th, 5th, and 8th graders to receive whole-class instruction, mastery learning, or tutoring. The 4th and 5th graders learned probability; the 8th graders learned cartography. On a post-test given at the end of the three-week experiment, the tutored group really did outscore the whole-class group by two standard deviations on average.

But the tests that students took were very specific. And the tutoring intervention involved a lot more than just tutoring.

Students took a narrow test. Burke and Anania chose the topics of probability and cartography for a specific reason—because those topics were unfamiliar to participating students. There is nothing wrong with choosing an unfamiliar topic; experiments in the science of learning commonly do so. But it’s easier to produce a large effect when students are starting from zero. Cohen, Kulik, and Kulik’s 1982 meta-analysis reported that tutoring effects averaged 0.84 standard deviations when measured on narrow tests developed by the study authors, versus just 0.27 standard deviations when measured on broader standardized tests. In 2020, Matt Kraft reported that effects of educational interventions generally—not just tutoring—are about twice as large when they are evaluated based on narrow as opposed to broad tests.

While Anania’s and Burke’s intervention did achieve two-sigma effects on tests of the material covered in their three-week experiment, it is doubtful that they could achieve similar effects on a broad test like the SAT, which measures years of accumulated skills and knowledge, or on the state math and reading tests that so many parents and teachers have worried about since the pandemic.

Certainly not in three weeks.

Tutored students received extra testing and feedback. Burke’s and Anania’s two-sigma intervention did involve tutoring, but it also had other features. Perhaps the most important was that tutored students received extra testing and feedback. At the end of each unit, all students took a quiz, but any tutored student who scored below 80 percent (in Anania’s study) or 90 percent (in Burke’s) received feedback and correction on concepts that they had missed. Then the tutored students took a second quiz with new questions—a quiz that students in the whole-class condition never received. If the tutored students still scored below 80 or 90 percent, they got more feedback and another quiz.

Bloom acknowledged that his students’ experiments included extra quizzes and feedback, but he asserted that “the need for corrective work under tutoring is very small.” That assertion was incorrect. Clearly the tutored students benefited substantially from feedback and retesting (see Figure 2). For example, in week one of Anania’s experiment, tutored students scored 11 percentage points higher on the retest than they did on the initial test. In week two, tutored students scored 20 percentage points higher on the retest than on the initial test, and in week three, they scored 30 percentage points higher on the retest than on the initial test.

A PhD Student’s Experiment on Tutoring

These boosts to performance, and their benefits for longer-term learning, are examples of the testing effect—an effect that, though widely appreciated in cognitive psychology today, was less appreciated in the 1980s. Students learn from testing and retesting, especially if they receive corrective feedback that focuses on processes and concepts instead of simply being told whether they are right or wrong. Burke’s and Anania’s tutors were trained on how to provide effective feedback. Indeed, Burke wrote, “perhaps the most important part of the tutors’ training was learning to manage feedback and correction effectively.” The feedback and retesting also provided tutored students with more instructional time than the students receiving whole-class instruction—about an hour more per week, according to Burke.

How much of the two-sigma effect did the extra testing and feedback explain? About half. You can tell because, in addition to the tutored and whole-class groups, there was a third group of students who engaged in “mastery learning,” which did not include tutoring but did include feedback and testing after whole-class instruction. On a post-test given at the end of the three-week experiment, the mastery-learning students scored about 1.1 standard deviations higher than the students who received whole-class instruction. That’s just a bit larger than the effects of 0.73 to 0.96 standard deviations reported by meta-analyses that have estimated the effects of testing and feedback on narrow tests.

If feedback and retesting accounted for 1.1 of Bloom’s two sigmas, that leaves 0.9 sigmas that we can chalk up to tutoring. That’s not too far from the 0.84 sigmas that the Cohen, Kulik, and Kulik meta-analysis reports for tutoring’s effect on narrow tests.

Tutors received extra training. Extra testing and feedback might have been the most important extra in Anania’s and Burke’s tutoring intervention, but it wasn’t the only extra.

Anania’s and Burke’s tutors also received training, coaching, and practice that other instructors in their experiments did not receive. Burke mentioned training tutors to provide effective feedback, but tutors were also trained “to develop skill in providing instructional cues . . . to summarize frequently, to take a step-by-step approach, and to provide sufficient examples for each new concept. . . . To encourage each student’s active participation, tutors were trained to ask leading questions, to elicit additional responses from the students, and to ask students for alternative examples or answers”—all examples of active, inquiry-based learning and retrieval practice. Finally, “tutors were urged to be appropriately generous with praise and encouragement whenever a student made progress. The purpose of this training was to help the tutor make learning a rewarding experience for each student.”

Although previous tutoring studies had not found larger effects if tutors were trained, the training these tutors received may have been exceptional. Anania and Burke could have isolated the effect of training if they had offered it to some of the instructors in the whole-class or mastery-learning group. Unfortunately, they didn’t do that, so we can’t tell how much of their tutoring effect was due to tutor training.

Tutoring was comprehensive. Many public and private programs offer tutoring as a supplement to classroom instruction. Students attend class with everyone else and then follow up with a tutor afterwards. But the tutoring in Burke’s and Anania’s experiments wasn’t like that. Tutoring didn’t supplement classroom instruction; tutoring replaced classroom instruction. Tutored students received all instruction from their tutors; they didn’t attend class at all. That’s important because, according to Cohen, Kulik, and Kulik’s meta-analysis, tutoring is about 50 percent more effective when it replaces rather than substitutes for classroom instruction.

It’s great, of course, that Burke’s and Anania’s students received the most effective form of tutoring. But it also means that it wasn’t the kind of tutoring that students commonly receive in an after-school or pull-out program.

All That Glitters

My father may have had a two-sigma tutor in 1945. His tutor couldn’t foresee Anania’s and Burke’s experiments, 40 years in the future, but her approach had several components in common with theirs. She met with her student frequently. She was goal-oriented, striving to ensure that my father mastered the 2nd- and 3rd-grade curricula rather than just putting in time. She didn’t yoke herself to the pace of classroom instruction but moved ahead as quickly as she thought my father could handle. And she checked his comprehension regularly—not with quizzes but with short homework assignments, which she checked and corrected to explain his mistakes.

But not all tutoring is like that, and some of what passes for tutoring today is much worse than what my father received in 1945.

In the fall of 2020, I learned that my 5th grader’s math scores had declined during the pandemic. I knew that they hadn’t been learning much math, but the fact that their skills had gone backward was a bit of a shock.

To prepare them for what would come next, I told them the story about my father’s 2nd-grade tutor.

“Grandpa got tutored every day for seven weeks?” they asked me. “That seems excessive.”

“You think so?” I asked.

“Yeah—it’s 47 hours!”

“Come again?” I asked.

They reached for a calculator.

Once a week I drove them to a for-profit tutoring center at a nearby strip mall. It was a great time to be in the tutoring business, but this center wasn’t doing great things with the opportunity. My child sat with four other children, filling out worksheets while a lone tutor sat nearby—available for questions, but mostly doing her own college homework and exchanging text messages with her friends. One day my child told me that they had spent the whole hour just multiplying different numbers by eight. They received no homework. From a cognitive-science perspective, I was pretty sure that practicing a single micro-skill for an hour once a week was not optimal. The whole system seemed designed not to catch kids up, but to keep parents coming back and paying for sessions.

Unfortunately, overpriced and perfunctory tutoring is common. In an evaluation of private tutoring services purchased for disadvantaged students by four large school districts in 2008–2012, Carolyn Heinrich and her colleagues found that, even though districts paid $1,100 to $2,000 per eligible student (40 percent more in current dollars), students got only half an hour each week with a tutor, on average. Because districts were paying per student instead of per tutor, most tutors worked with several children at once, providing little individualized instruction, even for children with special needs or limited English. Students met with tutors outside of regular school hours, and student engagement and attendance were patchy.

Only one district—Chicago—saw positive impacts of tutoring, and those impacts averaged just 0.06 standard deviations, or 2 percentile points.

My grandmother would never have stood for that.

After these results were published, some of Chicago’s most disadvantaged high schools started working with a new provider, Saga Education. Compared to the tutoring services that Heinrich and her colleagues evaluated, Saga’s approach was much more structured and intense. Tutors were trained for 100 hours before starting the school year. They worked with just two students at a time. Tutoring was scheduled like a regular class, so that students met with their tutor for 45 minutes a day, and the way the tutor handled that time was highly regimented. Each tutoring session began with warmup problems, continued with tutoring tailored to each student’s needs, and ended with a short quiz.

The cost of Saga tutoring—$3,500 to $4,300 per student per year—was higher than the programs that Heinrich and her colleagues had evaluated, but the results were much better. According to a 2021 evaluation by Jonathan Guryan and his colleagues, Saga tutoring raised math scores by 0.16 to 0.37 standard deviations. The effect was “sizable,” the authors concluded—it wasn’t two sigmas, but it doubled or even tripled students’ annual gains in math.

Is Two-Sigma Tutoring Real?

The idea that tutoring consistently raises achievement by two standard deviations is exaggerated and oversimplified. The benefits of tutoring depend on how much individualized instruction and feedback students get, how much they practice the tutored skills, and on the type of test used to measure tutoring’s effects. Tutoring effects, as estimated by rigorous evaluations, have ranged from two full standard deviations down to zero or worse. About one-third of a standard deviation seems to be the typical effect of an intense, well-designed program evaluated against broad tests.

The two-sigma effects obtained in the 1980s by Anania and Burke were real and remarkable, but they were obtained on a narrow, specialized test, and they weren’t obtained by tutoring alone. Instead, Anania and Burke mixed a potent cocktail of interventions that included tutoring; training and coaching in effective instructional practices; extra time; and frequent testing, feedback, and retesting.

In short, Bloom’s two-sigma claim had some basis in fact, but it also contained elements of fiction.

Like some science fiction, though, Bloom’s claim has inspired a great deal of real progress in research and technology. Modern cognitive tutoring software, such as ASSISTments or MATHia, was inspired in part by Bloom’s challenge, although what tutoring software exploits even more is the feedback and retesting required for mastery learning. Video tutoring makes human tutors more accessible, and new chatbots have the potential to make AI tutoring almost as personal, engaging, and responsive. Chatbots are also far more available and less expensive than human tutors. Khanmigo, for example, costs $9 a month, or $99 per year.

My own experience suggests that the large language models that undergird AI tutoring, by themselves, quickly get lost when trying to teach common math concepts like the Pythagorean theorem. But combining chatbots’ natural language capabilities with a reliable formal knowledge base—such as a cognitive tutor, a math engine, or an open-source textbook—offers substantial promise.

There is also the question of how well students will engage with a chatbot. Since chatbots aren’t human, it is easy to imagine that students won’t take them seriously—that they won’t feel as accountable to them as my father felt to his tutor and his mother. Yet students do engage and even open up to chatbots, perhaps because they know they won’t be judged. The most popular chatbots among young people are ones that simulate psychotherapy. How different is tutoring, really?

It seems rash, though, to promise two-sigma effects from AI when human tutoring has rarely produced such large effects, and no evidence on the effects of chatbot tutoring has yet been published. Over-promising can lead to disappointment, and reaching for impossible goals can breed questionable educational practices. There are already both human and AI services that will do students’ homework for them, as well as more well-intentioned but still “overly helpful” tutors who help students complete assignments without fully understanding what they’re doing. Such tutors may raise students’ grades in the short term, but in the long run they cheat students of the benefits of learning for themselves.

In the early going, it would be sensible simply to aim for effects that approximate the benefits of well-designed human tutoring. Producing benefits of one-third of a standard deviation would be a huge triumph if it could be done at low cost, on a large scale, and on a broad test—all without requiring an army of human tutors, some of whom may not be that invested in the job. Effects of one-third of a standard deviation probably won’t be achieved just by setting chatbots loose in the classroom but might be within reach if we skillfully integrate the new chatbots with resources and strategies from the science of learning. Once effects of one-third of a standard deviation have been produced and verified, we should be able to improve on them through continuous, incremental A/B testing—slowly turning science fiction into science fact.

Paul von Hippel is a professor and associate dean for research in the LBJ School of Public Affairs at the University of Texas, Austin.

This article appeared in the Spring 2024 issue of Education Next. Suggested citation format:

von Hippel, P.T. (2024). Two-Sigma Tutoring: Separating Science Fiction from Science Fact. Education Next, 24(2), 22-31.

The post Two-Sigma Tutoring: Separating Science Fiction from Science Fact appeared first on Education Next.

]]>
49717814
Lessons from Newark https://www.educationnext.org/lessons-from-newark-lineage-of-modern-school-reform-where-we-go-next/ Thu, 29 Feb 2024 10:00:41 +0000 https://www.educationnext.org/?p=49717949 The lineage of modern school reform and where we go next

The post Lessons from Newark appeared first on Education Next.

]]>

Democratic Newark Mayor and senate candidate Cory Booker, center left and Republican New Jersey Gov. Chris Christie, center right, joins others in Newark, N.J., Wednesday, Sept. 25, 2013, as they cut a ribbon during an opening ceremony for Newark charter schools.

Since the release of A Nation at Risk in 1983, the school reform movement has generated significant insights and promising practices for improving schools for children in poverty and students of color. The work of trying to radically improve student outcomes also produced glaring missteps and tough lessons. Few efforts demonstrate the complexity of attempting to provide a bold citywide plan to ensure educational excellence for all children better than the experiences in Newark, New Jersey. Much has been written about the political drama during my tenure as superintendent from 2011 to 2014. However, very little has been written about the actual playbook, results, and implications for educational policymakers and leaders.

I was appointed superintendent of Newark Public Schools (NPS) in 2011 by then governor Chris Christie and the state’s education commissioner at the time, Chris Cerf. While most school districts have a local board charged with hiring a superintendent, NPS had lost that authority back in 1995, when the state took control of the district.

As I arrived in Newark, 39 percent of students who entered the system failed to graduate, and only 40 percent of third-graders could read and write at grade level. Enrollment was plummeting. The district’s nearly forty thousand students and one hundred schools still made it the largest in the state, with the majority of students living below the poverty level.

Local politicians and families had grown impatient. For the five years prior to my arrival as superintendent, many elected leaders had become early adopters of a growing national charter school movement that aimed to free schools from government red tape and allow them autonomy to innovate. These supporters included Cory Booker (then a young councilman), school board member Shavar Jeffries (who now heads the charter school behemoth KIPP Foundation), and state senator Teresa Ruiz, among other notable local leaders. Charters weren’t the only new option—other school models, such as magnet high schools (often with entrance requirements) and partner-run small high schools, had gained momentum too.

Some of these schools had notable evidence of improving achievement for Newark students, and it was understandable that they were gaining strong support from local leaders, influential funders, and certainly the families of the nearly 5,500 students who attended them.

But it was clear that the most impactful efforts at improving schools in Newark were working around the very system they were trying to improve. And in New Jersey, these new schools were funded on a per-pupil basis; in other words, the money followed the child out of the traditional system and into the public charter system. Logically, this made sense. But in practice, this proliferation of competitors to district-run schools was creating unintended consequences that few wanted to discuss.

Cami Anderson was tapped as superintendent of Newark Public Schools in May 2011.
Cami Anderson was tapped as superintendent of Newark Public Schools in May 2011.

Building a “System of Great Schools”

Given the perilous state of the city’s schools, the unrealistic expectations around quick achievement gains, and the pressure from ideologues on all sides, many speculated that the superintendent role wasn’t doable. But I was inspired by the scale of the challenge and the ferocious commitment of many leaders in the community.

We started with the theory that the unit of change was the school itself and embraced the idea that what we were building was what my former boss, then New York City Schools chancellor Joel Klein, called “a system of great schools,” not a “great school system.” This was a subtle but profound distinction, because it meant we were seeking to ensure that there were one hundred excellent schools serving every child in every neighborhood—regardless of governance structure.

First, we needed to set a unifying goal for the district: every child would be college ready. That’s right, college, not just career—because we believed that choice of higher education should be up to the student, not simply determined by the inadequacy of their preparation, and because Newark families were demanding this.

In poll after poll, focus group after focus group, they told us very clearly: they wanted their children to graduate college ready. Moreover, they believed that “career ready” was a euphemism for low expectations. Families felt that academic excellence was a passport out of poverty.

Most parents were with us from day one. The challenge was the well-meaning funders and other influencers who wanted to muddy the waters and talk about everything except whether students could read, write, and do math at grade level.

When we started sharing actual data about proficiency rates and the number of young people earning diplomas indicative of their mastery of hard content, we started to encounter real pushback, both within and outside the school system. This was a theme I became increasingly familiar with: often what families say they want can be quite different from what those who speak for them are willing to stand for.

Ensuring “Four-Ingredient” Schools

With our North Star established, we rolled up our sleeves to improve the district, school by school. There was a large and growing body of research and evidence about high-performing schools in high-poverty neighborhoods. Combined with our team’s years of on-the-ground school transformation experience, we zeroed in on four basic ingredients that every high-quality school possessed: people, content, culture, and conditions.

Our aim: ensure that every NPS school was a four-ingredient school so that we could make steady progress toward college readiness for all. Our philosophy: focus on what works regardless of ideology, which often led to “third-way” solutions—combining the best of seemingly disparate views or forging a new path to transcend old, binary thinking. Our mantra: implementation matters.

People. It’s critical to have the right people in the right seats, from the leadership team to the teachers to mental health professionals to custodial staff.

We know intuitively the power that a great teacher has, and a growing body of research reinforced this belief, showing us that teachers are the most significant in-school factor determining a child’s level of achievement. Further, the most significant factor in getting great teachers in every classroom is the quality of the principal.

We focused on leadership from day one in Newark. I’ve never been to a great school with a mediocre principal, and I have never been to a failing school with a terrific principal (except perhaps at the very beginning of a turnaround). Within two years, we had replaced nearly one-quarter of our principals through aggressive recruiting and selection, giving preference to Newarkers and leaders who not only knew instruction but thought of themselves as community organizers and change agents.

Many states at this time were starting to use quantitative test score data in teacher evaluations, and New Jersey was eager to follow suit. However, my team and I felt that the science for such “value-added” approaches didn’t hold up when it came to determining the effectiveness of individual teachers. Not only did we feel that using the value-added approach in teacher evaluations would be unfair to teachers, we also knew that including such a poison pill in our new evaluation plan would create a backlash that could sabotage the entire effort. We took a lot of flak from hardline education reformers, who had become fixated on using test scores as a shortcut to accountability and who worried that our questioning the use of test scores in teacher evaluations would water down reform.

To help non-charter schools accelerate the “people” ingredient, we negotiated what was widely considered an ambitious contract with Newark teachers. Despite agreeing to key labor reforms after more than two hundred hours at the bargaining table, some in the Newark Teachers Union and their national affiliate, the American Federation of Teachers, vociferously advocated against them within weeks of the contract being ratified by an overwhelming majority of teachers. Both groups had a long track record of preserving some of the sacred cows of teacher labor negotiations: seniority-based placement, infallibility of teachers with tenure (regardless of what they do), and resistance to any form of accountability—no matter how nuanced. Meanwhile, we found many of our own ideas to be popular among everyday teachers, who told us the quality of the teacher in the classroom next door is a factor in whether or not they want to stay at a school. I was pushing largely because I believed then—and still believe now—that teachers unions need to evolve to become part of the solution or they will become obsolete.

We also had to completely restructure and reimagine the central office to be in service to schools and families. This required breaking senior leaders into new teams and inviting them to clearly articulate how they would enable the four school-level ingredients. It also meant crafting clear plans with goals aligned with good management and coaching—not simply doing what had always been done.

Content. A high-quality school needs high-quality and culturally competent curricula. It also needs frameworks, protocols, and data that drive great instruction and continuous improvement.

I started in Newark about a year after the Common Core State Standards had become a force nationally and the same month that New Jersey adopted a version of them. Common Core gave us an unambiguous and evidence-based target. It also served as a catalyst to scrutinize our curricula with a more rigorous lens.

The research here is undeniable; high-quality, culturally competent instructional materials are critical to ensuring that students are truly internalizing difficult content. Historically, though, we had all underinvested in this area in the early reforms after A Nation at Risk.

High-quality instructional materials are an ingredient that is hard to get right when you are working only at the school or small-network level. Scale is your friend. These decisions are better made at a system level, where content experts can dedicate the necessary time to addressing academic needs and cultural contexts, as well as coherence and alignment between the plethora of different curricula and assessments. It is also the area that, at the time in Newark, brought the most consensus. We did “teach-ins” for administrators, educators, influencers, and families who all really seemed to get and support the mandate for good, rigorous content that was consistent across the city.

Culture. Schools with intentionally curated environments characterized by high standards alongside high support produce better student outcomes.

From day one in Newark, we focused on the seminal research work and promising practices that had emerged, connecting how kids feel, how adults feel, and student outcomes. Years after comparing student achievement results to staff, student, and family survey responses, researchers Tony Bryk and Barbara Schneider found that the schools with high levels of trust were far more likely to get beat-the-odds results than their counterparts. Economists like Ron Ferguson and social policy experts like Christopher Jencks found a direct correlation between adult expectations, student surveys, and student outcomes.

Relatedly, an area where I have seen some of the greatest challenges for adults in establishing and preserving culture is in response to conflict and disruptive incidents. How we handle student discipline, struggle, and conflict is where adult biases show up the most. This is a problem not only from an equity and justice lens but also from a student achievement standpoint. Often students who need the most support and time on task are being excluded the most. Students can’t learn when they feel shame and helplessness. So it is no surprise to me that data shows that the relationship between the discipline gap and achievement is more than correlative—it is also causal.

For these reasons, we hired administrators who showed skill in building culture and partnering with families. We created an entire central-office team focused on student well-being and discipline.

We made progress, but admittedly, the playbook on culture is harder to run for many reasons. Too often, discussions about what student culture should feel like are preachy, ideological, or theoretical—devoid of practical, research-based, promising practices. Building culture is far from a paint-by-numbers task. Effective cultures don’t feel the same in every school, but they do share key components. This is nuanced and hard to teach to administrators. The culture work requires us to surface and address adult biases about what kids can accomplish and what is considered “dangerous” behavior, and this can cause real discomfort and resistance.

Conditions. This ingredient is all about strong operations and infrastructure.

It is important to address the physical environment and the day-to-day operations. None of the other ingredients of a strong school or system can succeed if we don’t address the conditions in which our children learn and our teachers teach. In Newark, we had a lot of work to do on this ingredient.

When I started, Malcolm X Shabazz High School had a river running through its fourth floor on rainy days. Many schools didn’t have air conditioning, in a city where average temperatures reach above a humid ninety degrees for months. Some schools weren’t even wired for internet access, and only a few had laptops to check out to students for the day.

Local leaders openly talked about a “rolling start” at the beginning of the school year, which referred to the fact that it took weeks to sort out the basics: enrollment, special education schedules and services, buses, and even books. Honestly, I had never heard of a system where instruction didn’t start on day one.

Some of these intolerable conditions were due to bad public policy and some were because of poor management. My team and I would say we could tell if a school was getting results by how visitors were greeted at the door (if at all) and how quickly families could get the answer to whatever they were asking. We created school operations managers to attend to the operational needs of the school. At the time, this got me in trouble with the administrators’ union (because I was seen as encroaching on district administrator roles and jobs). Even today our approach to operations is considered innovative, which just shows how little we prioritize the conditions in our schools.

The One Newark Plan

While establishing a focus on college readiness and building four-ingredient schools was our primary focus right out of the gate, we knew we had to make progress on a citywide plan that addressed the schools beyond our purview. Looking at the full picture in Newark, you saw that everyone was doing their own thing, and the unintended consequences of this lack of coordination were becoming more evident and unsustainable every day.

From our earliest school visits, we could see that the poorest neighborhood schools were emptying out and becoming concentrated with the highest-need students and the lowest-quality staff. The diversity and variety of school models wasn’t materializing; with all the new schools, we weren’t actually providing a lot of choice, just more flavors of “no excuses” ice cream at the elementary level and a bunch of run-of-the-mill high schools.

Meanwhile, every year, including my first, our district had to cut about $50 million. While there was certainly a lot of bloated bureaucracy to streamline, more than 80 percent of that money was wrapped up in people. Newark Public Schools employs many Newarkers in a city with double the national poverty rate.

As a city, we had to ask ourselves: “Is it even possible for every child in Newark to have access to a school that meets their needs? Even those children facing the longest odds?”

Our team had no choice but to stare down these questions, which led us to some unconventional and controversial answers. The first thing we had to do: try to rise above political arguments rooted in ideology and self-interest about what type of school models should exist. There were about a hundred schools in Newark. We knew we would get to excellence more quickly if we had a variety of governance structures: traditional, charter, magnet, partner run, and hybrid. But we also knew we couldn’t simply let a thousand flowers bloom and allow others to die, especially when those vulnerable schools were serving our students with the highest need. We also knew that the community deserved excellence citywide.

We pored over our own data: student enrollment trends across governance models, overall city population trends, facilities assessments, and (of course) student outcomes. We fanned out and hosted more than a hundred community-based meetings with faith-based leaders, nonprofit executives, families in struggling schools, families in high-performing schools, charter advocates, charter operators, private schools, local funders, elected officials, union leaders, and early childhood providers. We began to socialize the idea that we needed one citywide plan across governance structures, as well as the harsh reality that the district’s footprint had to shrink. We wanted to find a way to preserve the best of the new-schools movement while also addressing some of the unacceptable consequences of its uncoordinated growth.

This process—over the course of about a year—led to a comprehensive plan we called One Newark. The plan opened with three core values to drive our collective decision-making: equity, excellence, and efficiency:

  • Excellence: We must ensure that every child in every neighborhood has access to a “four-ingredient” school as quickly as possible and that no kid is in a failing school.
  • Equity: We must ensure that all students—including those who are facing the longest odds—are on the pathway to college and a twenty-first-century career.
  • Efficiency: We must ensure that every possible dollar is invested in staff and priorities that make a positive difference for all students.

We launched headlong into implementation in the winter of 2013–14.

We started publishing “family-friendly” snapshots—across both district and charter schools— so that community members could see how their schools were doing in comparison to schools with similar populations. We looked at overall proficiency but also at growth, critical in a city like Newark with low proficiency rates across the board. We also compared schools with similar student populations to one another.

We created a simple red, yellow, and green system so that the community could see the landscape clearly. “Red schools” were low-proficiency, low-growth schools. Green were high proficiency and high growth. Yellow schools were “on the move” (low proficiency, high growth) or “to watch” (high proficiency, low growth). The color-coding was clear and intuitive, and many in the community started talking about “no red schools.”

We placed an emphasis on transparent data about how schools were doing with students in poverty, students with disabilities, and English learners. We created standard measures—across district and charter schools—to report on student retention. People from all sides fought us on this level of transparency—the unions, some charter schools (which weren’t obligated to share their data with us), and some funders who worried we were reducing children to numbers. But many families and policymakers embraced the information. There’s no perfect system, but there was no way to make a citywide plan without a decent measure of school quality.

We performed detailed enrollment analysis and defined the need for a common definition of a “minimum viable school.” From a funding standpoint, schools with fewer than 500 students are hard to sustain with a staffing model that ensures things like appropriate class size, electives, teacher preparation times, and staff to attend to running operations. Newark had a lot of “red” schools that were also not financially viable, and many of them were in the poorest neighborhoods.

We also looked at demand data—who was applying to charters and from what neighborhoods, who was seeking new small high schools and from what neighborhoods, and which neighborhoods were growing and which were shrinking.

The picture was becoming increasingly clear: the need for a course correction was long overdue. We had traditional schools where 80 percent of families were on charter school waiting lists, but the district’s resistance to collaboration and the charters’ insistence on growing only one grade level each year meant large-scale closures and consolidations were inevitable.

The district had too many elementary schools overall, due to a population decrease, neighborhood shifts, and charter growth. We didn’t have enough early learning centers to meet the increased demand. We had too many selective high schools. Most of the new small high schools being incubated downtown were serving families from other wards, while iconic and historic high schools were emptying out. The picture was bleak. We had to make some hard decisions.

We decided to be radically transparent about our findings and the implications in a proposed ward-by-ward plan. Some charters should take over existing schools with high demand, keep families who opted in, and keep the buildings and the school name, instead of simply continuing to build new schools one grade at a time. Some elementary schools needed to convert to early learning centers. Some small high schools that were performing well needed to move into our comprehensive high schools, and some underperforming partner-run high schools needed to close. Magnets had to change their enrollment process. And some buildings had to be shut—some condemned, some repurposed, and some sold, potentially to charters.

KIPP Thrive Academy opened in the closed district Eighteenth Avenue School in 2015, one example of the public education reform efforts of the One Newark plan.
The charter school KIPP Thrive Academy opened in the closed district Eighteenth Avenue School in 2015, one example of the public education reform efforts of the One Newark plan.

Another anchor of the One Newark plan was ensuring that every family had equal access to choice. Both psychologically and practically, it didn’t make sense for one-third of families to get what they wanted and the rest to get what was left over. For starters, this dynamic was creating an almost civil war–like atmosphere, with charter and non-charter families pitted against each other and magnet and nonmagnet families screaming at each other in meetings. Also, one goal of establishing high-performing schools in high-poverty neighborhoods is to feed the groundswell of belief that kids can achieve. Newark’s choice system was helping create a self-fulfilling prophecy of failure in the non-charter schools.

This is where universal enrollment came into play. All families could access the system and apply to all schools. An algorithm gave preference to kids in the neighborhood, followed by kids in poverty, then kids with disabilities, and then everyone else at random.

Book cover of "The Prize"
Dale Russakoff’s award-winning account of education reform in Newark revealed the challenges in turning around the city’s public school system.

It was a game changer. Now all schools were required to think about how to market themselves and own their quality, or lack thereof. By year two, more than three-quarters of the families of kindergartners and ninth-graders were using the system. At one point, we opened a family support center to help families exercise choice. We had planned for a soft launch, but word got out and more than a thousand families showed up on the first day, and the situation almost devolved into chaos. While our critics crowed about our operational failure—and it was indeed a failure—it also showed how much family demand there was for choice and quality. This is one of the hundreds of examples I’ve had throughout my career that defies the ridiculous stereotype that poor families don’t care about education.

The universal enrollment system may have been hardest on some members of Newark’s political elite who were used to the benefits afforded to them in an unfair, transactional system. I recall one meeting in which a prominent official—previously a supporter of mine—yelled, “You made a liar out of me! I told my cousin I could get her kid into this school!”

Our team knew that the tenets of the plan were bold, unconventional, and controversial and that the politics were going to be tough to navigate. Choice, charters, labor reforms, and teacher excellence polled well. Laying off Newarkers and teachers and “closing” traditional schools or turning them over to highly successful charters were wildly unpopular. But to have the plan succeed citywide, you couldn’t have one without the other.

To add a deeper degree of difficulty, while the plan was emerging and leading up to the official launch, we suffered a series of seismic political blows. In September 2013, the Bridgegate scandal broke and increasingly sidelined Governor Christie. Shortly thereafter, then senator Frank Lautenberg tragically passed away. Mayor Booker, who had also been an active and strong supporter of the plan and was working hard to build momentum around it, announced he was running for that U.S. Senate seat. His announcement also spurred the need for an earlier-than-expected mayoral election where the leading candidates spent considerable time spewing hatred about charters and about me personally (although backstage and publicly, they had previously supported both). Shortly thereafter, Commissioner Cerf resigned. To use a sports analogy: the entire offensive line left the field.

The overall approach was comprehensive, and it had to be to ensure that none of our kids were trapped in failing schools, the district didn’t go bankrupt, communities weren’t living with vacant buildings, and the city was on a path to success. I described the plan to author Dale Russakoff as “three-dimensional chess” in an effort to convey why all the pieces had to happen at one time and couldn’t be phased. There were too many interdependent parts to a very complex system, and the stakes couldn’t be higher. Unfortunately, in her 2015 book about Newark, The Prize, which went on to become a bestseller, this quote fed an inaccurate portrayal of me as a top-down, cold technocrat—a narrative that was taking shape across much of the media coverage about our work in Newark. It couldn’t have been further from the truth—the emotional pieces of what needed to happen were not lost on me or the team. I lived with my husband and baby son in Newark and had conversations with neighbors in grocery stores and local watering holes on a daily basis. It all felt so heavy, but also necessary.

Results and Lessons

During my tenure and the subsequent years under Cerf, our district teams improved outcomes for students in every neighborhood and every age group—from early childhood to high school.

In early childhood, we secured a $7 million Head Start grant (becoming only the second district in the country to do so) to add more than one thousand early childhood seats. We brought early childhood standards to life and sounded the alarm to focus on the importance of high-quality early learning. Newark went from having fewer than half of our residents eligible for free early childhood programs (which was most families) in those programs to enrolling nearly 90 percent.

In 2015, the Center on Reinventing Public Education named Newark as the top district in the country based on its share of high-poverty, high-performance elementary schools. By 2019, more than one-third of Black students attended schools that exceeded the state average, compared with 10 percent in 2011. The number of good schools and schools “on the move” grew every year due to our district-run turnaround approach, charter conversion schools, and some outright closures and consolidations. Newark was among the top four cities in the country for student outcomes of Black students living in poverty.

The citywide graduation rate rose 14 points, closing the gap with the state average by 7 percentage points—with almost double the percentage of students graduating having passed the state exit exam. About 87 percent of Newark graduates who enrolled in college returned for a second term, far exceeding national averages despite high poverty rates.

And we saw signs that the overall community—despite the political rancor we encountered— was starting to believe in the “system of great schools.” For the first time in decades, student enrollment was increasing overall in Newark, as was the population of the city.

Because we felt responsible for every child in Newark, we engaged all families, charter and district, with equal vigor. This was a good and mission-aligned approach, but it was almost impossible to execute, given the tensions (both perceived and very real) inherent in growing the charter footprint. The conundrum is perfectly exemplified by the mother who called in to ask me a question on-air during a local NPR show. She had just dropped off her kids at North Star Academy Charter School, she said, because she needed them to have access to excellence. At the same time, she was on her way to my office to picket against me on behalf of her nephew, who had lost his job as a school aide due to the smaller footprint of the district.

Our strategy all along was to be up front about failure and embrace accountability. Again, while our radical transparency seemed like a good idea on its face, it turned out that a lot of people don’t want to hear their school is failing—no matter how carefully crafted the message. We prioritized students who were at the back of the line. Our universal enrollment system gave preference to students from the poorest neighborhoods and those with disabilities. We revamped the magnet school admissions process to look at multiple factors for student admissions at the central office. These were good decisions for children, families, and equity, but it also put us in the crosshairs of power brokers who were used to getting what they wanted and considered coveted seats theirs to give out. They also had access to the biggest microphones and would use them to mobilize the community against our efforts.

Some charter school operators and their supporters mobilized their constituents in opposition to these citywide efforts as well. They wanted to grow where they wanted to grow, not necessarily in alignment with supply-and-demand patterns or the overall plan.

Charters weren’t the only group stuck in their own goals and plans—and at least most of their concerns were in service of building quality schools. School-based partners and vendors, local nonprofits, funders, and other leaders all had their individual projects, schools, and pet issues. The incentives to keep doing one’s own thing were profound. I was stuck in a daily loop of explicit and often threatening demands to support individual agendas—many of them having nothing to do with what was best for individual neighborhoods and schools, let alone the collective.

We had to find a way for the idea of choice to lift all boats, but it wasn’t happening—and it can’t happen without good public policy and collective action. I’ve had many school choice advocates dispute this. Some will have you believe that the mere presence of competition somehow magically raises everyone’s game. It certainly didn’t happen that way in Newark, nor in the dozens of systems I have worked in since. The One Newark plan should have been envisioned before the unintended consequences were at our doorstep. Maybe that would have given us more time.

I also made mistakes. My messages were not straightforward and sticky enough. This work, as you can see, is complex and multifaceted, and I could have paid more attention to how to ensure good, proactive, community-friendly communication.

More critically, I needed to develop a more sophisticated understanding of how to see the community in relation to the system of schools. In Figure 1, the center is the school, and the next level out is the families and students. The next ring is influencers—folks connected to the school who have direct influence on that specific school. The next ring is community-wide partners—community-based agencies and other city agencies like police and child welfare. And the outermost ring is elected officials and power brokers—for instance, pastors of large congregations, thought leaders, and community-based organizations serving the city.

Figure 1: The community in relation to the system of schools

Figure 1: The community in relation to the system of schools

We knew it was critical to focus on our families and students, and we knew it was a tremendous amount of our work to build collective action focused on them. I give us high marks for our dogged and strategic work on the red ring. But in retrospect, we spent far too much time with folks in the outermost ring—the political and power class—and not enough with those in the orange. It wasn’t until nearer the end of my tenure that we started to create a database for each individual school’s orange ring. I came to realize a hard lesson—that while the politicians and power brokers confidently spoke for the community, they were often after a political win: a contract, a coveted spot in a school, a policy, or a job for a family member or friend. I wish I could take precious minutes I spent with those in the green ring and reinvest them in the orange ring.

The painful but informative experiences I had in Newark, along with a long career since then of working with systems leaders across the country, have convinced me that collective action is the missing link for change at the systems and community levels. Too often, we interchange concepts of true grassroots organizing and community engagement and sidestep the obvious truth that power brokers and special interest groups have an organized, well-resourced, and often outsized influence on speaking for the community.

Among the lessons Anderson learned as superintendent in Newark was the value of engaging community and systems leaders alike in collective action.
Among the lessons Anderson learned as superintendent in Newark was the value of engaging community and systems leaders alike in collective action.

Conclusion

The insights I’ve shared above are not based on any specific ideology. They were developed out of necessity and refined through years of application and practice across a wide variety of settings—from New York to California and many places in between, in both districts and charter networks, in small school communities, and in the largest cities and states.

It may seem like a lot to tackle, and indeed it is. But if we are to truly transform our systems at scale, we can’t simply cling to one specific ingredient or hew to a single governance ideology. The surest way to avoid bias and ensure a holistic strategy is to zoom out to the community-level goal. Make the community—not just one school, network, neighborhood, or district—the unit of change.

The story of Newark should push all of us to define the role of the “system” and why it is so critical and yet so difficult to fulfill that mandate for an entire community. In short: the system should manage the incentives, policies, guardrails, and resources to ensure that every child has access to a high-quality school by doing four things.

Enable “Four-Ingredient schools. As discussed above, we the value of a game-changing principal in every school and an excellent educator in every classroom; the impact of high-quality instructional materials that are culturally competent; the research on school culture and handling discipline; and what conditions have to be in place to enable achievement. Systems leaders should set direction and advocate; procure best-in-class materials; set policy to incentivize districts, schools, and charter management organizations to implement what we know works; and sanction practices antithetical to student progress.

Ensure quality and equity. The paradigm of districts versus charters sadly guarantees that many kids—particularly those with the most challenges—are left behind. Policymakers and community leaders should be held accountable when they allow kids and families to fall through the cracks. Leaders need to be accountable for ensuring all kids access high-quality schools. Our new accountability systems should correct for mistakes we made before, from focusing only on proficiency and meaningless graduation rates to treating growth, college-readiness, and retention as critical outcome measures.

Break bureaucracy. A fundamental way to clear a runway for accelerated school improvement is to actively tear down past practices and federal, state, and local policies that block individual schools from innovating. We need more of a “whiteboard” approach than one that tweaks decades of dysfunction. Policymakers and community leaders need to wake up every day wondering what they can do to ensure that people running schools have the time to do the right thing as opposed to managing byzantine policies and procedures from competing departments.

Create cross-system and community-based solutions. The students who face the most challenges have generally been failed by multiple systems. Statistically, they are likely to be students of color. Too often they are labeled “special populations” and further marginalized out of classrooms and into separate and unequal programs. To truly reverse patterns for students that systems have failed the most, we need cross-agency and community-based solutions with school success at the core: more out-of-the-box ideas to aggregate services and help students who are the most vulnerable succeed.

I share these ideas and epiphanies humbly and with tremendous gratitude to the countless friends, colleagues, and mentors in this sector who helped shape my beliefs about this work. It’s been more than a decade since I arrived in Newark and forty years since A Nation at Risk. My hope is that we’ve all gained a bit of useful perspective and are ready to roll up our sleeves and put the lessons we’ve learned into action.

Cami Anderson was superintendent of Newark Public schools from 2011 to 2014. She is the Founder and CEO of ThirdWay Solutions.

Excerpted from a chapter of A Nation At Risk +40: A Review of Progress in US Public Education, a collection of essays published by the Hoover Institution that reflects on education reform in the four decades since the landmark 1983 report.

The post Lessons from Newark appeared first on Education Next.

]]>
49717949
Are Student Surveys the Right Tools for Evaluating Teacher Performance? https://www.educationnext.org/are-student-surveys-right-tools-evaluating-teacher-performance/ Wed, 07 Feb 2024 10:00:13 +0000 https://www.educationnext.org/?p=49717756 Yes. No. Maybe.

The post Are Student Surveys the Right Tools for Evaluating Teacher Performance? appeared first on Education Next.

]]>

Illustration

In the 1990s standardized tests became entrenched in American K–12 schools as nearly every state, and later the federal government, adopted policies that mandated annual testing and held schools accountable for the results. In the ensuing decades, however, educators and policymakers began to recognize that high-stakes testing was not living up to its promise and that the single-minded focus on test scores had produced unintended (although, in retrospect, entirely predictable) consequences.

Increasingly, school districts across the country are now turning to an alternative evaluation tool—surveys that ask students to rate their teachers and their schools on various metrics of quality and effectiveness. This growing use of evaluative surveys in K–12 reflects a rare consensus among education policy wonks and activists, bringing together strange ideological bedfellows who all believe surveys can help achieve their goals and priorities.

Unfortunately, there is a risk that education leaders will make the same mistakes with surveys that they did with standardized tests—overpromising and not thinking through perverse incentives. Fortunately, it’s not too late to consider carefully both the promise and the likely pitfalls of using student surveys as a measure of teacher and school performance.

Judging Teachers

Education research has established that teachers are the most important in-school factor influencing student academic achievement. The same research, however, documents considerable variation in the effectiveness of public school teachers, suggesting that improving the workforce—by providing professional development for existing educators, recruiting better teachers through nontraditional pathways, and dismissing the poorest performers—offers a promising policy lever for raising student outcomes. Many states reformed their teacher-evaluation policies during the 2010s, after the Obama administration launched its Race to the Top grant competition, which incentivized states to adopt rigorous evaluation systems designed to measure and reward teacher contributions to student learning.

This effort did not work out as hoped. With a few notable exceptions, such as the highly regarded IMPACT system in Washington, D.C., it seems that efforts to improve teacher rating systems have largely been a bust. One recent analysis of state-level teacher-evaluation reforms found “precisely estimated null effects.” Commentators have offered many hypotheses as to why these initiatives fell short, but one probable explanation is that the metric of teacher quality preferred by reformers—“value added” to student test scores—can only be calculated for a minority of teachers, since most do not teach grade levels and subjects where standardized tests are administered annually. The ensuing push for one-size-fits-all evaluation systems resulted in considerable weight being put on other, more easily gameable or subjective measures of performance that could be applied to more teachers.

That is one reason why some accountability hawks are now pinning their hopes on student surveys, which can be administered in every subject and to students as young as grade 3. The innovative teacher evaluation system in Dallas, identified as one contributor to recent improvements recorded by the city’s lowest-performing schools and described as a national model by some reformers, relies heavily on student surveys. The Dallas survey of students in grades 6–12 asks them to evaluate factors such as the teacher’s expectations of students, the positive or negative “energy” in the classroom, the fairness of the teacher’s rules, the depth of a teacher’s subject knowledge, the frequency of helpful feedback, the clarity of instruction, and more.

Critics of standardized testing have also written favorably about student surveys, arguing that they help move education leaders beyond the obsessive focus on test scores by identifying other aspects of teacher and school quality valued by students, parents, and policymakers. One of the most influential researchers in this area is Northwestern University economist Kirabo Jackson (an Education Next contributor). In pathbreaking work, Jackson showed that measures of teacher quality based narrowly on contributions to test-score improvement missed many other ways teachers affect long-run student outcomes. More recently, Jackson used data from Chicago high schools to show that student surveys can help quantify important dimensions of school quality, including school climate, that affect not just student achievement but also outcomes such as high school graduation rates and criminal-justice involvement. Jackson’s recent appointment to President Biden’s Council of Economic Advisors suggests that survey-based measures are likely to play a bigger role in federal school-improvement efforts in the future.

Student surveys also play a central role in policies promoted by many other political entrepreneurs. For example, on the political left, increasing interest in social and emotional learning will also mean greater reliance on student surveys, since they represent one of the few ways in which such skills can be measured and quantified. At the same time, conservatives have embraced surveys in their efforts to promote free speech and protect ideological diversity in schools. Proposed legislation in Ohio, based on model bills developed by high-profile conservative think tanks, would require that public university professors have their teaching evaluated in large part through student surveys, including a specific question asking, “Does the faculty member create a classroom atmosphere free of political, racial, gender, and religious bias?”

Sample of a student survey
The Student Experience Survey for students in grades 6 to 12 in the Dallas Independent School District asks them how they feel about their class and the teacher. Such teacher evaluation systems are credited with helping to improve the city’s lowest-performing schools.

Too Much Too Fast?

Promising as these developments may seem, it is concerning that the hype surrounding student surveys has gotten well ahead of the evidence. Researchers have devoted too little attention to validating survey-based measurements to confirm that they assess the things policymakers hope to measure. Nor have decisionmakers sufficiently considered the potential consequences of attaching high stakes to student survey responses. (Jackson’s work in Chicago sheds little light on this question, as it was conducted at a time when surveys were not part of the city’s school accountability system.)

One cautionary piece of evidence comes from the Gates Foundation–funded Measures of Effective Teaching project. As part of this effort, researchers compared three distinct ways of assessing teacher quality—test-score value-added, classroom observations, and student surveys. While early data did find some evidence that survey-based measures predicted test-score growth, these results were not confirmed in the more rigorous part of the study in which students were randomly assigned to different teachers. The final results found no relationship between student survey scores and improvements in academic achievement, prompting researchers to suggest “practitioners should proceed with caution when considering student survey measures for teacher evaluation.”

Photo of Kirabo Jackson
Kirabo Jackson’s research showed that student surveys helped quantify how schools affected graduation rates and subsequent criminal justice involvement.

Other potential problems also need scrutiny. For example, one recent study examined the association of survey-based measures of student conscientiousness, self-control, and grit with outcomes such as school attendance, disciplinary infractions, and gains in test scores over time. While researchers found a positive relationship between attitudes and behavioral outcomes among students attending the same schools, these correlations disappeared when the same data were aggregated up to the school level and compared across campuses. Most worrying, the authors also found that high-performing charter schools, shown through randomized lotteries to improve both student attendance and academic achievement, recorded the lowest scores on the student surveys. One possible explanation is that the school environment may have affected survey responses in unexpected ways—with students in classes made up of higher-performing peers rating their own attributes more critically, through a form of negative social comparison.

Such results are unlikely to surprise political pollsters, who have long understood the importance of both priming and framing effects in shaping survey responses. That is, even modest changes in the survey-taking context—such as changing the order of the questions—can have a significant impact on the responses. Designing survey questions that actually measure what their authors intend to measure requires considerable skill. Small variations in question wording—for example, describing a protest as an exercise in free speech as opposed to a threat to public safety—can yield sharply different results. Unfortunately, too few education practitioners working with student survey data have any rigorous training in survey research methods.

Finally, although many now appreciate the ways in which high-stakes accountability policies can encourage “teaching to the test,” few have considered the problem of “teaching to the survey.” Letting students weigh in on teacher evaluations, as is done under the Dallas model, is a great way to encourage teachers to do more of what students want. But whether those changes lead to improvements in instructional quality is another matter, and there are many reasons to expect that they won’t.

Lessons from Other Fields

Fields outside of primary and secondary education that have used evaluative surveys for decades provide disturbing examples of undesirable and problematic gaming behaviors that such surveys can incentivize. At the college level, student evaluations have long served as the primary method for evaluating teaching, and considerable evidence indicates that this practice has contributed to grade inflation. Regardless of the specific questions included in the survey, student responses appear to reflect their satisfaction with grades (higher is better!) and the effort required in the course (less is better!). Some professors have even resorted to bringing sweets to class on days when students complete their surveys, as such treats seem to significantly boost evaluation scores.

As Doug Lemov has argued, grading reforms implemented during the pandemic in hopes of reducing stress and supporting teenage mental health have contributed to grade compression and diluted the returns to student effort (see “Your Neighborhood School Is a National Security Risk,” features, Winter 2024). The experience from higher education suggests that incorporating student surveys into formal teacher evaluations will only exacerbate these dynamics.

Although some equity advocates have reacted with alarm to recent research finding racial gaps in principals’ evaluations of teachers, systemic bias—against women, nonwhite professors, and nonnative English speakers—has long been documented in student-survey evaluations of college instructors. Ironically, growing interest in inherently subjective surveys coincides with technological changes, including using AI to classify and score recorded lesson videos, that promise to remove much of the personal discretion from teaching observations.

Even more concerning evidence comes from the field of medicine, where patient satisfaction surveys are required for hospital accreditation and, since the passage of the Affordable Care Act, linked to Medicare reimbursements. For example, some studies suggest that patients rate doctors more favorably when they prescribe antibiotics on demand, including for viral colds for which this treatment is inappropriate because it may contribute to the rise of antibiotic resistance in the population. One journalist has argued that, because a number of the patient-satisfaction questions ask about pain management, the use of high-stakes surveys has also contributed to America’s opioid epidemic by creating pressure on doctors to overprescribe pain pills in order to achieve higher ratings.

If there is one lesson that the past four decades of education reform have taught us, it’s that well-meaning policies rarely work as their proponents expect and hope. Sometimes they even backfire, producing the opposite of what was intended. Both practitioners and policymakers should remember these lessons as they think about how to incorporate student surveys into education-accountability systems or use such data to shape policy.

Vladimir Kogan is a professor in The Ohio State University’s Department of Political Science and (by courtesy) the John Glenn College of Public Affairs.

This article appeared in the Spring 2024 issue of Education Next. Suggested citation format:

Kogan, V. (2024). Are Student Surveys the Right Tools for Evaluating Teacher Performance? Education Next, 24(2), 32-37.

The post Are Student Surveys the Right Tools for Evaluating Teacher Performance? appeared first on Education Next.

]]>
49717756
Anxiety, Depression, Less Sleep … and Poor Academic Performance? https://www.educationnext.org/anxiety-depression-less-sleep-poor-academic-performance-decade-smartphone-dominance-negative-naep-trends/ Thu, 26 Oct 2023 09:00:34 +0000 https://www.educationnext.org/?p=49717239 A decade of smartphone dominance and negative NAEP trends

The post Anxiety, Depression, Less Sleep … and Poor Academic Performance? appeared first on Education Next.

]]>
Smartphones are nearly universal among U.S. teenagers, who are also experiencing record levels of anxiety and sleeplessness.
Smartphones are nearly universal among U.S. teenagers, who are also experiencing record levels of anxiety and sleeplessness.

It’s understandable. The education world is awash in articles trying to figure out what artificial intelligence is going to mean for schools and students (see “AI in Education,” features, Fall 2023). But before we get too focused on the latest technological breakthrough, let’s not pretend that we have figured out how to cope with the previous one. Over the last decade, smartphones have become commonplace. Today, 95 percent of American teenagers have a supercomputer in their pocket.

Jonathan Haidt, Jean Twenge, and others have brought necessary attention to the likelihood that smartphones and social media are partly to blame for the teenage mental health epidemic gripping our nation. It’s not a watertight case, because it’s nearly impossible to prove a causal relationship with a phenomenon as ubiquitous as this one.

What scholars can say is that the sudden rise in teenage anxiety and depression, suicidal ideation, and suicide all happened at the same time that teenagers’ adoption of smartphones passed the 50 percent mark—around 2012 or 2013. They can also show that the children most likely to engage in heavy use of smartphones and social media—girls, especially liberal girls—also experienced the greatest increase in mental health challenges. And they can point to other countries that show similar patterns.

My purpose here is not to evaluate this evidence, though I generally agree with Haidt that we should adopt the precautionary principle and assume that phones and social media are likely doing real damage to our kids. Then we should act accordingly.

My immediate question, however, is whether phones and social media might also be behind the plateauing and decline of student achievement that we’ve seen in America, also starting around 2013, long before pandemic-era shutdowns sent test scores over a cliff.

I don’t believe this was the only cause of our achievement woes in the 2010s. As I’ve argued before, I believe the Great Recession was also to blame, both because of its impact on families’ home circumstances, and because of the sudden and significant budget cuts that followed in 2013 and 2014, especially in high-poverty schools. Kirabo Jackson has been particularly persuasive that these spending cuts had a measurable negative impact on achievement (see “The Costs of Cutting School Spending,” research, Fall 2020). Another potential factor was a shift away from school accountability; in 2012 the Obama administration softened the consequences for low test scores targeted by the No Child Left Behind Act. Then in 2015, and Congress replaced it with the Every Student Succeeds Act.

But I do think we need to take the smartphone hypothesis seriously. Especially because, unlike the Great Recession or the pandemic, these trends are not receding in the rearview mirror. Indeed, adolescent phone use continues to rise. If it is one reason that students aren’t learning as much as they did in the pre-smartphone era, that’s a problem we need to grapple with.

Figure 1: Explosive Growth in Adolescents with Smartphones

So what’s the evidence? First and foremost, as mentioned above, the timing lines up (see Figures 1 and 2). We see smartphone ownership really taking off among adolescents in middle and high school around 2013. That’s also when median achievement on the 8th-grade math test in the National Assessment on Educational Progress (NAEP) peaked. It’s fallen modestly ever since. For our lowest-performing students—those at the 10th and 25th percentiles—the declines were more dramatic.

Figure 2: Declines in Math Performance

Another piece of evidence comes from Catholic schools, which serve as a plausible control group for the smartphone hypothesis (see Figure 3). Catholic-school students also take NAEP math and reading tests. But they are not directly impacted by changes in education policy such as the shifts in federal school-accountability rules or cuts in public-school spending. So if Catholic schoolkids also saw achievement declines around 2013, which in fact happened, especially in reading, that could be an indication that something outside education policy is to blame.

Figure 3: Similar Trends in Catholic Schools

But there is also some conflicting evidence. The drops in achievement in the 2010s tended to be for our lowest-achieving students, who are disproportionately poor, Black, Hispanic, and male. And yet, as we know from the studies that Haidt and others point to, phone and social media use was most concentrated among middle-class girls (at least initially). So that doesn’t match up.

Before I conclude with the obligatory call for more research, it’s worth pondering what mechanisms could link smartphone and social media use to lower student achievement. Most obvious are problems around attention, as students’ brains adapt to the rush from “likes,” YouTube videos, TikToks, and other platforms, and then struggle to listen to (much less read) slower-moving and less-vivid presentations, such as the ones they are likely to encounter in class and homework. (Our poor teachers!) Or it could be phones’ impact on mental health; it’s hard to learn when you’re anxious or depressed.

There’s also the issue of sleep (see Figure 4). This is cited in the mental health literature, too, as we know that kids sleep less today than before phones and social media entered the scene, and we also know that there’s a relationship between less sleep and poor mental health.

Figure 4: Teens Sleeping Less

But so too is there a relationship between less sleep and less student learning. After all, sleep is when the brain works much of its magic, forming connections and cementing ideas in long-term memory. Plus, it’s hard to learn when you’re tired, and it’s really hard to learn when you stay home from school because you have been up much of the night. So there is an angle here that also connects with our chronic absenteeism crisis.

What to make of all of this? If we return to the precautionary principle, the least we can do is try to encourage parents to curb their tweens’ and teens’ phone and social media use. Educators can do their part by setting and enforcing classroom rules that phones be turned off or at least stowed away, unless there’s a compelling instructional reason to use them—though that is admittedly an uphill battle (see “Take Away Their Cellphones,” features, Fall 2022). Abolition is likely impossible, though some legislative proposals to make it harder for kids to access social media apps until they are 16 might help. But schools could certainly encourage parents to limit screen time to a reasonable number of hours per day, be much tougher about earlier bedtimes, and require kids to dock their phones outside their bedroom during sleeping hours. There’s a strong foundation of research to back up any effort to protect and promote students’ sleep, which may help ease some uncomfortable conversations (see “Rise and Shine,” research, Summer 2019).

Indeed, more sleep might be the killer app that could make a huge difference—both for students’ academic achievement and mental health. It’s a good reminder that as we contemplate the future impact of AI on schools and society, what likely matters most aren’t the machines we use but the attention we give to our children’s timeless human needs.

Michael J. Petrilli is president of the Thomas B. Fordham Institute, visiting fellow at Stanford University’s Hoover Institution, and an executive editor of Education Next.

This article appeared in the Winter 2024 issue of Education Next. Suggested citation format:

Petrilli, M.J. (2024). Anxiety, Depression, Less Sleep… and Poor Academic Performance? A decade of smartphone dominance and negative NAEP trends. Education Next, 24(1), 76-79.

The post Anxiety, Depression, Less Sleep … and Poor Academic Performance? appeared first on Education Next.

]]>
49717239
Your Neighborhood School Is a National Security Risk https://www.educationnext.org/your-neighborhood-school-national-security-risk-student-achievement-merit-losing-prospects-era-everybody-wins/ Tue, 24 Oct 2023 09:00:05 +0000 https://www.educationnext.org/?p=49717199 Student achievement and merit are losing prospects in the era of “everybody wins”

The post Your Neighborhood School Is a National Security Risk appeared first on Education Next.

]]>

Recently, I spoke with a student I’ll call Ella. She’s a biochemistry major at a college in the Northeast now, but she went to high school in a town outside a medium-sized coastal city, the sort of town that families move to for the public schools.

Ella didn’t squander the opportunity. She took seven AP classes; she took AP Calculus BC as a junior and a college-level class in linear algebra her senior year. She racked up a 96 average. Several teachers wrote her notes telling her they appreciated having her in class and encouraging her to continue in the STEM field.

So you might be surprised to find that, thinking back, Ella considers a lot of what she did a mistake. “I was so stupid. Every party I skipped, I should have gone,” she reflected. The kids who went to the parties didn’t do as well on tests and papers as she did but, she observed, “nobody knows that but me.”

She was motivated and liked learning, but she was also competitive. She assumed that she would work a little harder, delay some gratification, and her extra effort and accomplishment would be valued and acknowledged—rewarded, even. But everywhere she turned, the signal—this is a student who has done more—was diluted. She resented it.

IllustrationGrade inflation was one way she felt her hard work had been undervalued at her high school. You got a 95 or a 96 if you did exceptional work, but pretty much everyone who did a credible job got a 93. A 90 definitely put you in the bottom half.

And the grade inflation was also grade conflation. As high grades get easier and easier to achieve, the highest grades can only go up so far. The difference between excellent and decent is compressed. The signal that 96 is different from 94 becomes hard to see. That distinction could still reveal meaningful differences, at least hypothetically, if it were calculated consistently and if people paid careful attention to it. A ranking of students would help, for example, but Ella’s high school didn’t do that, because the practice was seen as too competitive. Being on the honor roll didn’t help, because the “honor roll” included more than half the students in each grade. Taking harder classes wasn’t factored into grade-point-average calculations, though at least her school hadn’t eliminated honors classes in the name of equity as other schools in her city had. And the degree of grade inflation within the school was wildly inconsistent, Ella said. Teachers in some classes—especially the easier ones—gave high grades lavishly. “It was pass/fail, basically. If you did the homework, you got a 95. I think the teachers thought that would make them popular.”

It wasn’t just Ella’s high school either. Her district’s elementary schools had replaced “traditional grading”—As and Bs—with a system of “standards-based grading.” Students received grades on each of about 30 skills, reported on with statements such as, “Student can write sentences to create meaning.” The scores arrived on an obscure and jargony scale: mastery, partial mastery, and emerging mastery. This list of descriptors signaled very little to parents, who could be forgiven for wondering what the forest looked like with so many “emerging mastery” trees: “OK, so she has mastered writing sentences to create meaning: Did she write those sentences when asked to? Was the ‘meaning’ she created average? Exceptional? Does she excel at writing? Should we take her out to dinner and say, ‘You are doing school just right; this is the path?’ Was she in fact struggling?”

One way to disguise a signal is to clog the channel with so much information that people don’t know what matters, what the signal means, or to what to compare it. The idea that grades should not be used to reveal which students have achieved more or worked harder—that grades should describe what a student can do, not what they did do—is heartily endorsed by many teachers, but you could be forgiven for suspecting that they have mixed incentives. Is the love for inscrutable grading an accountability dodge justified on sketchy educational grounds? Does it provide merely the illusion of data? A parent who can’t really detect the signal is less likely to make waves or ask questions. And, of course, only some parents want there to be a signal. Making everyone look equally successful makes a lot of people happy.

A sort of tacit collusion emerges: when almost everyone gets what they want, the school becomes easier to run. Teachers are happy because no one calls them to argue about grades, and kids aren’t competitive and pushy. As Mike Schmoker points out in his book Results Now 2.0, the illusion that everyone is doing great “discourages demand for substantive changes.” This makes the administrators happy too, and at Ella’s school, as at most others, they took no steps to address grade inflation. It is no surprise that national data from the ACT show high school students’ grades rising—a majority of college test-takers now report receiving an A in each subject—even as their achievement scores have stagnated or declined (see Figure 1).

In Ella’s district, the net effect of all this had been to make comparison, recognition, and distinction increasingly difficult to achieve. The argument was that this was a good and healthy thing. Stress, we are told, is toxic, and a school is doing its part to ensure the wellbeing of the next generation if it removes the deleterious effects of competition, comparison, and anxiety.

Figure 1: Course Grades Rise as ACT Test Scores Fall

In Defense of Stress

In fact, the common belief that stress is necessarily harmful is wrong, notes Stanford health psychologist Kelly McGonigal. Her book The Upside of Stress describes how she initially believed stress was toxic until, in reading studies she thought would provide evidence of its dangers, she found to her surprise that the people who are healthiest, happiest, and live longest are not those who have the least stress but rather those who are able to view stress as part and parcel of doing consequential things in life. What matters is your mindset toward stress, and ironically, the development of healthy thinking about stress requires exposure to it.

Sports offer a good example. Successful athletes know they can’t avoid the stress of competition. They tell themselves, “I am feeling stress because I am about to test myself and see how well I can do. The stress I feel is a good thing because it tells me that I care.” Athletes who adopt that attitude about stress can do so because they’ve often experienced pregame anxiety. They use self-talk to manage stress.

McGonigal isn’t saying we should maximize stress but rather that its relationship to wellbeing isn’t linear. Excessive stress is bad, but moderate stress is beneficial, normal, and often better than no stress. “Stress is what arises when something you care about is at stake,” she writes. “You can’t create a meaningful life without experiencing some stress.” Stress motivates action, can accelerate learning, and often leads to a “tend and befriend” response that draws people together and builds community—which, in turn, helps to create wellbeing.

Figure 2 illustrates a Yerkes-Dodson Curve. It describes the typical relationship between stress and performance. There’s a healthy debate about how placing different forms of “performance” on the vertical axis influences the shape of the curve—the optimal level of stress is different for an athlete and a laboratory scientist—but learning is one form of performance, and the principles of the curve apply. If I were a student, I would produce little or no work without some pressure. But if my teacher applies a bit of pressure—“There’s a test on Monday” or “There’s a paper due”—suddenly I am more apt to study over the weekend, to work hard on the paper. I’m likely to be focused. My performance improves. I don’t want to be overanxious about the test. I want to know the test is important and be motivated to deliver my best. In fact, even if the test isn’t graded, the stress involved in the process of recall helps encode learning.

Still, much of the time if you see a graph like this one, the word stress gets replaced along the horizontal axis with a more palatable term such as pressure or challenge. That tells you something about our collective mindset toward stress. Its connotation is so negative that people respond better to claims of its usefulness if the word stress is replaced with a euphemism.

This is characteristic of the way we treat many psychological phenomena in schools. We presume a linear relationship between the phenomenon and its results. It must be either good or bad. We can find clear examples of competition being counterproductive, indicating that we should seek to eliminate it. But competition is like stress. Too much of it is bad, but so is too little.

In Ella’s district, as in so many others, students were told stress could harm them. “They were always telling us we could visit the counseling center during midterms. And I’d always I think: ‘It’s a test. Why do you think I need a counselor for that?’” Ella recalled.

Figure 2: Stress and Performance across Fields

Everybody Wins

Elite colleges too, Ella found, were oddly dismissive of academic distinction. When she scored 1500 on the SAT, she was happy. She thought it would set her apart, but that year almost all of the colleges she was applying to made the SAT optional. “The kids who got 1200 or 1300 didn’t submit. My guidance counselor told me it wouldn’t matter one way or the other if I submitted my scores.” The key was her extracurriculars. Getting into the schools she wanted was in part a lottery—everyone was qualified, with high grades and no obligation to submit test scores—and in part a competition to curate a compelling array of enrichments and interests.

I noticed this when I visited campuses with my own kids. The first thing admissions staff said was often: “[Fill in the name of elite college here] is not a school for people who want to spend their time in the library. We’re looking for people who are involved and engaged and active.” You know, well-rounded. Everyone in the room would nod. Cool. Students who might want to spend part of a Friday night in the library seemed to be the one group you could safely criticize on a college campus.

At one school I visited, a parent asked about distribution requirements. “You have to take at least one ‘quantitative’ class,” the admissions representative told the group, “but really it’s easy to get around. Almost anything can count as a quantitative class.” She listed examples of classes that could be used to avoid the necessity of technical or mathematical work.

We walked out of the meeting and my son said: “Can you imagine a university in Russia or India saying that? Don’t worry about taking anything that’s technical while you’re here?”

Ella was no slouch outside the classroom, mind you. She played varsity lacrosse and was an accomplished violinist. She was a good athlete but not a star, and she decided rather than starting with places where she could be recruited, she would choose a school for academic reasons and try to make the lacrosse team as a walk-on. Music, for its part, meant submitting a portfolio of her work that would be labor intensive for a school to evaluate. She didn’t want to study music seriously in college, and it seemed like a long shot that anyone in the music department would listen to her portfolio.

In short, Ella tried to sell herself as a student first, and she now sees what a mistake that was. In the end, the students who had earned median 94s to her top-of-the-class 96s and who took easier classes all seemed to get into the same schools she did—as well as some schools that she didn’t. Some were better athletes, and some had curated experiences she could never afford, such as working in an animal sanctuary in Central America during vacations. Those students had understood that such experiences were more important than the SAT.

This scenario did not apply in every case. Some students who got into their first-choice schools had top grades. But even they had more or less won the lottery that results from everyone looking about equally qualified. This randomness disconcerted Ella. “There were kids who got into top schools that I thought, ‘Yup. Makes sense. She earned it.’ But there were so many kids who you were just like, ‘Are you kidding? I did all the work in the group project because she had literally no idea what was going on and now and she’s going to Duke.’” Ella wanted the process to reflect academic merit and felt strongly that it didn’t.

Maybe your reaction to this is: “So what? There are lots of smart kids. Not everyone gets in. Get over it. Ella’s at a perfectly good school.” Or maybe you’re thinking: “There’s probably more to the story; how does she know what the girl going to Duke did? Or dealt with?” Maybe you’re even a little bit scornful of Ella’s ambition and competitiveness. Shouldn’t her motivation to go the extra mile be intrinsic? Maybe you assume that her parents were pushing her. The lesson should be for her to chill out.

But an interesting question to ask at the societal level is: What would we want a disappointed striver like Ella to say? I should have worked harder would be a good response. I will work harder, learn more, grab the next opportunity. But Ella’s response—I should have partied more; I’ve learned my lesson about going the extra mile—is the opposite. She sees a larger ecosystem in which the desire for distinction, knowledge, and a drive to excel are mostly irrelevant.

Everybody wins, under the system that Ella grew up in—a system that guides and shapes the mindset of most American students—except a small number of kids who lose out in their quest to distinguish themselves. It’s easy to dismiss those kids, and their often-foreign-born parents, as hypercompetitive and out of step with the times. Why do they need to compare themselves to anyone else? They got good grades. So what if everyone else did, too?

But think about Ella as a societal asset—someone who could, if she works hard and pushes herself, contribute one day to groundbreaking research. There’s a second group that loses in a system that dilutes signals of excellence. That group is the society that, whether it realizes it or not, is counting on its Ellas to preserve its prosperity and national security. Because while our system was doing everything it could to weaken and dilute competition and meritocracy, the wider world was changing. Quickly.

Meanwhile, in Bakhmut and Beijing . . .

Schools are, among other things, the supply chain for the principal resource on which a modern democracy depends: knowledge, understanding, and, just maybe, belief in shared principles like meritocracy that unite a society.

You may wonder what an economic term like “supply chain” has to do with education, but supplying talent for the economy is part of what schools are supposed to do. We are edging closer to the brink of a new cold war with either Russia or China or both—a competition in which knowledge and advanced technical expertise will play an increasing role in protecting our society from tyranny and maintaining our global position.

In Ukraine, for example, a western-trained military has bravely held off a vastly larger and belligerent invading army. Part of the story of that success lies in the power of meritocracy: decisionmaking devolved to proven mid-level officers close to the conflict, effective ideas from all levels of the organization quickly identified, approved, and scaled. In the Ukrainian army, talented people and worthy ideas are valued and leveraged far better than in Russia’s sclerotic hierarchy. That has had a direct result in sustaining Ukraine’s national sovereignty.

But consider how different that view from the front lines of democracy would be without the technological superiority of HIMARS rockets guided by Starlink satellite Internet, an advanced missile-defense system that Russia cannot crack. No technological superiority, no democracy.

It’s worth pausing here to note the perspective of Ilya Buynevich, a professor of geology at Temple University who grew up under Soviet rule in Ukraine. He wrote recently in a periodical called Campus Reform about a paradox he was noticing on campus. While almost every aspect of society in Soviet Ukraine was less meritocratic than the U.S.—it was a blend of enforced egalitarianism bereft of opportunities for the masses and massive privilege for the connected few—the education system was in fact far more meritocratic than the U.S. education system. “Soviet universities produced excellent scientists despite (not thanks to) the political system,” he wrote. “Merit was the decisive factor past all the nepotism and corruption.” Even a corrupt autocracy knew that scientific expertise was the key to their global ambitions. “When administrators in the Soviet Union wanted to tip the scales on class enrollment, they would make the examinations much harder.”

As armed conflict and cold wars alike have increasingly come to favor technologically superior societies, we might be tempted to feel optimistic. That’s us! But that optimism may not be justified. Are we ready to stay a step ahead of the Russians and the Chinese? Who is more likely to develop the next Starlink?

Start looking for answers at the top. Though the United States has perhaps the best universities in the world, the science and engineering programs that churn out the ideas and expertise that culminate in microprocessors and HIMARS are stocked heavily with students from abroad, and especially with students from the nations whose allegiance is now most tenuous. To put it in economic terms, we rely on imports. The domestic supply of college graduates with advanced scientific expertise is insufficient to fill the seats in our own elite programs.

“Foreign students accounted for 54 percent of master’s degrees and 44 percent of doctoral degrees issued in STEM fields in the United States in 2016–2017,” a Congressional Research Service report noted in 2019. The number of foreign-born STEM students had doubled since 1988–89. The two most common nations of origin were China—now an explicit geopolitical rival—and India—currently wavering between allegiance to the West and alignment with China and Russia.

Pick up a copy of the Financial Times, The Economist, or the Wall Street Journal and you will read about the national security priority of “de-risking” supply chains. Is it a problem that 80 percent of the copper and lithium and rare earth metals necessary to manufacture cutting-edge technology tools come from China or places firmly in the Chinese sphere of influence? You bet it is. But the supply chain of the most important building block of all, technical expertise and knowledge, is far from de-risked.

Consider the new factories being developed under the Biden administration’s CHIPs and Science Act, designed to boost the semiconductor industry for both economic and national security reasons. The date for opening the first domestic chip fabrication factories has been pushed back because the technical expertise required to install and manage the high-tech fabrication and design equipment is all but nonexistent in the U.S. The Taiwanese firm opening a plant in Arizona made plans to bring in staff from Taiwan to train American staff when they couldn’t hire the people they needed. Immigrants—that is, people educated by school systems other than our own—“account for about 40% of highly skilled workers in America’s semiconductor industry,” The Economist reported. By 2030, the broader high-tech economy, in- cluding fields critical to national security, will face a shortage of 1.4 million qualified workers. “Set this against the total of roughly 70,000 students who complete undergraduate degrees in engineering in America each year, and the scale of the deficit becomes apparent,” the article went on to note.

One could argue that the mass importation of technical expertise isn’t all bad. Many of those foreign nationals who come to our universities choose to stay in the U.S., and this represents a strategic benefit. But it’s a supply chain that is far from secure. And the underlying reality—that the supply chain exists because it provides what our own school systems cannot—should scare us. We want to make sure we can supply our own rare earth minerals if China cuts off the supply, but we are blithely unconcerned about the insufficient supply of domestically educated students in advanced technological programs. And those students who do attend such programs in U.S. universities are weighted heavily toward first-generation immigrants and their children: they are students who strive because of the cultures they brought with them when they moved here. They are the families Ella’s school overlooked in favor of the illusion that everyone is a winner.

They are people like Mr. Lee, a parent at a school where I taught many years ago. He was a scientist who had emigrated from Taiwan. He was paying a lot of money to send his son, Charles, to the independent school where I worked so he would be well prepared for higher education. But he wanted to meet with me because he was so disappointed. “There are pep rallies for sports,” Mr. Lee observed. “Where are the pep rallies for school? Where is attention given to the best students?”

Not knowing anything better to say, I told him the truth. “We don’t really do that here.” By “here,” I meant the school, but the point could certainly apply more broadly.

Most of the builders of tomorrow’s cutting-edge technology will probably not come from our own school systems; and those American students who do reach this pinnacle will do so because they hear some other music than what our schools’ sound systems are playing. They will toil away in schools where young people are convinced they have math anxiety, where advanced classes are eliminated in the name of equity, and where the slightest whiff of competition is seen as unhealthy. And then they will apply to colleges where admissions staff proudly announce that the merely scholarly should just as well look elsewhere.

China, fighting hard to erode our global influence, must laugh at stories about American schools eliminating advanced classes, about how teaching algebra is a form of oppression, about how elite colleges market themselves as places where it’s easy to avoid math, and about how the best universities in the world are downplaying objective academic criteria in favor of a vague and subjective calculus of extracurricular experiences—many of which only the wealthy can access.

The Chinese must clearly see the global advantage our school system provides them. You could almost imagine that they invented TikTok to nudge us along our path to mediocrity while they use technical expertise as a tool to shape a new world order. In fact, differences in how the app’s algorithm functions in the U.S. and in China, where the platform promotes a steady stream of educational and patriotic videos and children are limited to 40 minutes of content each day, suggest as much. “It’s almost like they recognize that technology is influencing kids’ development, and they make their domestic version a spinach version of TikTok, while they ship the opium version to the rest of the world,” a social media expert told 60 Minutes.

Consider for a moment the difficulty of enforcing sanctions against Russia. Ever wonder why so many Latin American and African nations have failed to join the sanctions and generally seem lukewarm to the pro-democracy world order the U.S. and its allies lead?

In large part it’s because China has quietly built a sphere of influence through a model that involves providing developing nations with sophisticated engineering projects beyond the scope of what they could otherwise accomplish and then supplying untenable financing for those projects. Ghana owes China $2 billion for infrastructure projects while Zambia owes $6 billion, and in all likelihood those countries cannot pay back their loans. Those nations and dozens like them are firmly in the Chinese sphere of influence now. In much of the developing world, the urgency of debt refinancing wins out over any lure of democracy. The Chinese have eroded a coalition aligned to Western interests through engineering expertise and corrosive capital, while schools like Ella’s steer students away from technically demanding and “stressful” fields like engineering.

A Solution

So, what to do about it? How do we reinvigorate the culture of meritocracy and achievement in our schools? How do we prepare ourselves for a future that both honors the capacity of our young people—that challenges them so they achieve their best—and prepares our nation to retain its global position and secure its safety?

Restore the SAT and ACT. Measures of achievement matter—first, because they communicate that achievement itself matters. That’s true even if you believe that such tests are gameable. If gaming the SAT means paying a tutor to help you catch up on math or learn several hundred vocabulary words, or even more cynically to help you learn strategies to manage your mindset during testing situations, we should fix that. But even the workarounds that prosperous families come up with benefit society more than if those same parents try to outfox the system by paying for private fencing lessons or hiring a consultant to help little Johnny craft his image more artfully through his essay. People prepare for tests by studying. This reinforces the purpose of the endeavor and produces benefits even before the test is taken.

More important, the SAT and ACT remain the most objective measures of academic achievement we have. Are they perfect? No. But they are far more objective than classroom grades—and far less open to gaming, privilege, and perverse incentives. And they are a lot less manipulable than, say, an inscrutable system that prizes high-priced activities such as a lifetime of tennis lessons. Help me to see the equity benefits there.

Some kind of objective measure (or as objective a measure as we can devise) is always the first step. That’s the case even if we then consider other factors that add context to the scores of students from schools that prepare them less well—a 1400 from a student who attends a school with precious few advanced courses and who is first in their family to go to college is in many ways more impressive than a 1500 from a student at an elite boarding school. Having an objective measure does not mean we cannot adjust it to address inequities in the system. But an explicitly academic measure is far more just and meritocratic than a system of nebulous, inchoate incentives that reward students who have the resources to curate their lives around that system. Did people really think the wealthy would not be best positioned to game a system based on extracurriculars? Kudos to MIT, the first university to push back on the movement to eliminate the SAT. What they found when they examined the data, of course, was that making an entrance exam optional decreased equity.

But also expand and broaden the assessments. One critique of college admissions tests is that their scores don’t correlate well with college success because what they measure is too narrow—mostly math and English in the case of the SAT, on the assumption that scores in those subjects are proxies for achievement in other academic areas. Compare that to England’s system of GCSEs, or General Certificates of Secondary Education. Students take assessments at the culmination of their pre-university years in a variety of subjects they choose. These subject-specific assessments measure knowledge rather than proxy skills. They are better correlated to what happens in college, more rigorous, and, if technical expertise is our goal, would allow us to test specific areas like chemistry, biology, and physics. A system like England’s would help immensely by better measuring achievement and more of it.

Data can also help. Imagine a school that reported to parents and others the average grade in each class and the 25th- and 75th-percentile grades. Imagine if, when you got your child’s grade on a test or a report card, you had that information. Was her 94 above or below the mean? Does “emerging mastery” mean a warning light is flashing for my 3rd grader? With data, the discussion begins. There is sunlight. Parents are empowered. Data provide not only knowledge for parents but also a degree of accountability for schools that allow rampant and asymmetrical grade inflation. Perhaps private institutions couldn’t be made to do this, but public schools certainly could.

We shouldn’t limit this push for change to K–12 schools, by the way. Rampant grade inflation at the university level doesn’t help either. The average grade at elite colleges in America is an A. Everybody wins once again! But it raises the question: How does muting the incentive to work a little harder and do a little more affect students’ knowledge and achievement?

Combat the idea that lower standards are an equity win. Equity means ensuring that each child has the fullest opportunity to reach the highest possible standards in a fair way. It means great schools in every community. Eliminating advanced courses and putting caps on achievement is folly from both an economic and national-security perspective. And it is a catastrophe for and insult to any group on whose behalf we suggest eliminating challenging work and rigorous standards. I don’t believe that there is any group of Americans who can’t or won’t try to rise to such challenges. It’s time we fought back. Why not provide advanced courses earlier for every child who wants them in every school?

Overcome our fear that competition and stress will hurt young people. The narrative that competition hurts rather than strengthens us, that stress will break us and our children, is the root of the problem. Where did that narrative come from? We don’t eschew competition in sports, at least not at the secondary school level and higher. Shielding kids from competition in the academic sphere communicates that we think children are fragile. While we don’t want to create a pressure cooker for our youth, being able to handle stress, challenge, and competition is a valuable skill for creating a life of meaning.

One could almost imagine it as a conspiracy. A few people get to the head of the line and are prosperous. They want their children to maintain a place in the world that affords them opportunity and success. They argue that there should be no more competition, that competition hurts people. For those already at the top of the heap, it’s a great strategy for perpetuating status. It’s just not very fair—or very useful for a country that tells itself it’s a meritocracy. To remain competitive and secure as a nation, we must expect our young people to strive to reach their full potential and give them every chance to do so.

Doug Lemov is the author of several books on teaching, including Teach Like a Champion 3.0. His next book, co-authored by Colleen Driggs and Erica Woolway, will focus on science- and research-based literacy instruction.

This article appeared in the Winter 2024 issue of Education Next. Suggested citation format:

Lemov, D. (2024). Your Neighborhood School is a National Security Risk: Student achievement and merit are losing prospects in the era of “everybody wins.” Education Next, 24(1), 34-41.

For more, please see “The Top 20 Education Next Articles of 2023.”

The post Your Neighborhood School Is a National Security Risk appeared first on Education Next.

]]>
49717199
Settle for Better https://www.educationnext.org/settle-for-better-how-overpromising-undercut-education-reform-movement/ Tue, 20 Jun 2023 09:00:43 +0000 https://www.educationnext.org/?p=49716719 How overpromising undercut the education reform movement, and what to do about it

The post Settle for Better appeared first on Education Next.

]]>

IllustrationWhen I first got involved in education reform back in 1993, a quote attributed to the famed anthropologist Margaret Mead had become a mantra at gatherings of those of us in “the movement”: “Never doubt that a small group of thoughtful, committed citizens can change the world; indeed, it’s the only thing that ever has.”

Everyone in the room would nod their heads in agreement and breathe in the heady inspiration that comes from being with like-minded people who share a belief in the righteousness of their cause and the inevitability of their success. For us “happy few” crusaders, history and justice were on our side.

Thirty years later, and after spending the last eight years in state bureaucracy as the Massachusetts secretary of education, I still believe in the ideas and aspirations behind the reform efforts of the 1990s and 2000s, but it’s now clear that our ambitions were exaggerated, and our timeline was way off—most memorably the promise that No Child Left Behind would get 100 percent of students to proficiency in English and math by 2014.

This is not a rationale for abandoning the cause; quite the opposite. It’s the foundation for rededicating ourselves to the hard work that needs to be done one day at a time, by shifting our mindset from the visionary call to “change the world,” to a more pragmatic directive to “do your job” (as New England’s own Coach Bill Belichick might say).

Education reform that had its beginnings in the 1980s and came into full bloom in the 1990s and the first decade of the 21st century had four basic components:

• Standards, assessment, and accountability, to set and raise expectations, along with measurement of school and student performance, to create a culture of data-driven decisionmaking and timely action to address systemic weaknesses

• Innovation in school models and instructional tools and systems, often tech-enabled, to shift the learning process from mass production to mass customization

• Robust teacher recruitment and practice-based training, to attract the best and the brightest and give them the skills they need to be highly effective, as measured by effects on student achievement

• Autonomous schools and parental choice, to provide front-line educators with real decisionmaking authority and to empower parents to vote with their feet when their children were stuck in low-performing neighborhood schools

What knit these elements together was a belief that applying the lessons of modern management and competitive markets from both the for-profit and nonprofit sectors would yield significant improvement to K–12 education, specifically as measured by student achievement and other academic or career outcomes. More compelling was the commitment to employ these strategies to eliminate the persistent performance gaps between schools serving high-poverty communities of color and schools serving well-to-do, mostly white suburbs.

In the words of both George W. Bush and Barack Obama, this remarkably bipartisan effort to raise student achievement and close gaps represented “the civil rights issue of our time.”

For a variety of reasons, the education-reform zeitgeist has shifted. Indeed, “education reform” is now considered to be a loaded term that is no longer spoken in polite company without risking a heated argument or losing the friendship of former allies. Although the Trump presidency accelerated the break-up, the coalition had begun to fray years before.

Loss of Consensus

The biggest sea change occurred with the loss of consensus that raising the level of academic achievement in historically underserved communities is essential to the pursuit of greater social equity. This is not just a matter of toning down the rhetoric around college-for-all to make room for career readiness; it’s also a reflection of a breakdown in the shared understanding of what educational excellence means and the purpose of schools in the first place.

Photo of Albert Shanker
Albert Shanker

The late Albert Shanker, legendary president of the American Federation of Teachers, once said, “The key is that unless there is accountability, we will never get the right system. As long as there are no consequences if kids or adults don’t perform, as long as the discussion is not about education and student outcomes, then we’re playing a game as to who has the power.”

At the August 2022 meeting of the Massachusetts Board of Elementary and Secondary Education, here’s what Max Page, the current head of the Massachusetts Teachers Association, said in opposition to the state’s student-assessment system:

It [strikes] me that we have a fundamental difference of views of what schools are for. The focus on income, on college and career readiness, speaks to a system that . . . is tied to the capitalist class and its needs for profit. We on the other hand have as a core belief that the purpose of schools must be to nurture thinking, caring, active and committed adults, parents, community members, activists, citizens.

How did we get here?

The general social and political environment certainly had a lot to do with it, but I think those of us in the education reform community, including state policymakers, need to reassess our own contributions.

To motivate people and mobilize resources to take on a big challenge, you need to tell a compelling story—about both the problem you’re trying to solve and your vision for the future. In the terminology of the day, you need a “burning platform” and a “theory of change.” For at least two decades, the messaging used by reformers worked to power a genuine national movement for education reform.

The rub is that creating excitement about dramatic change can eventually lead to overpromising and under-delivering—and when the results don’t keep pace with expectations, disappointment and disillusionment ensue. What’s more, the narrative of “transformation,” uplifting to many, can have a demoralizing effect on the people and organizations that are doing their best to get results within the existing “dysfunctional” system.

The Role of State Policy

Even under the best of circumstances, moving the needle on overall student achievement and closing gaps across communities and student subgroups at scale is a multi-generation task. It is certainly not something that can be achieved through policy reforms in one or two terms of a president or a governor.

Affecting student outcomes is only partially and indirectly a function of public policy. State policymakers, in particular, can help create the conditions within which improvement can occur by fairly and equitably allocating financial resources, establishing rigorous standards and aligned assessments, and providing meaningful and timely information to educators and local officials. Policy can also disrupt the status quo by authorizing the creation of new schools, allowing parental choice, and enabling state education agencies to intervene in the lowest-performing schools or districts.

The 1993 Massachusetts Education Reform Act established the commonwealth’s version of the national standards-based reform movement, which culminated in the federal No Child Left Behind Act of 2002. As documented by Harvard economist Thomas Kane, the impact of these reforms in Massachusetts and across the United States is arguably among the most successful social-policy stories of the past 50 years, notwithstanding more recent stagnation or decline. Massachusetts significantly expanded its investment in K–12 education through a progressive funding formula and at the same time developed rigorous curriculum frameworks along with high-quality and well-aligned student assessments. It also established a school accountability system tied to performance-based outcomes and authorized some of the country’s earliest and best charter schools. Through these measures, the commonwealth was able to raise its overall level of school quality and student achievement, especially during the first two decades of reform.

Student performance on the mathematics portion of the National Assessment of Educational Progress provides a telling example. Between 1992, just before the Education Reform Act was passed, and 2011, Massachusetts saw an increase of more than 25 scaled-score points at both 4th and 8th grade, moving in the state rankings from ninth and twelfth place, respectively, to number one. Although progress on gap-closing has been mixed and inadequate, the scaled-score difference in mathematics on the NAEP between white and Black 4th graders in Massachusetts was reduced by one-third over the same period.

Getting the policies right is a challenge, and once they’re implemented, their effects take time to emerge. Lasting change requires sustaining those policies in the face of ongoing pressure to turn back the clock or to try something else.

Over the course of the last eight years, the state’s Board of Elementary and Secondary Education, largely appointed by Republican Governor Charlie Baker, took steps to update and reinforce many of these core elements of the 1993 reform by

• revising curriculum frameworks

• developing “next generation” student assessments for the Massachusetts Comprehensive Assessment System (MCAS)

• strengthening the accountability framework by broadening its performance metrics and sharpening its focus on improvement among the lowest-achieving students

• re-benchmarking and raising the “competency determination” for high school graduation based on MCAS

All of this took place in a political and legislative environment that has become at best ambivalent toward standards-based education reform, as the weaknesses that plagued the system prior to the Education Reform Act fade from memory and as student performance gains flatten or recede. Holding the line going forward will likely become an increasing challenge as Massachusetts state government transitions to full one-party (Democratic) rule.

Notwithstanding the fact that the Massachusetts Education Reform Act and similar laws in other states have played a crucial role in improving student outcomes, when all is said and done, the best policy environment only makes improvement possible; it doesn’t make it happen. That change can only occur at the ground level, in more than 100,000 schools and more than two million classrooms across the country.

So, if policy effects tend to diminish over time, what can state education officials do that might make a lasting difference?

Doing nothing is not an option, for at least two reasons. First, most state governments, including Massachusetts, have a constitutional obligation to ensure all students receive an adequate education. Municipalities operate schools as a delegated responsibility, so when things go wrong, the state is ultimately on the hook. Second, even though decentralization sounds like it would be fertile ground for innovation and continuous improvement, each school district in effect operates as a monopoly, typically at the toleration of its local teachers union. Throw in the outsized influence of graduate schools of education in teacher training and you have the “iron triangle” that holds public education in its grip. In this environment, only state government has the leverage to create space for real change.

In getting more directly involved in educational programs and practice, however, state policymakers need a heavy dose of humility. From a teacher’s point of view, the only thing worse than having someone from the central office telling you what to do is having someone from the state department of education telling you what to do.

Governor Baker’s dictum throughout his administration was “Do more of what works.” That approach, ideally backed up by solid evidence, not only provides the greatest promise for positive near-term student impact but also offers the path of least resistance when it comes to adoption and effective implementation by educators.

There are a variety of proven programmatic initiatives that state policymakers might pursue (although unfortunately it’s not a terribly long list). During the Baker administration, our priorities were:

Early literacy. In fall 2022, the state Board of Elementary and Secondary Education adopted regulations requiring all children in grades K–3 to receive semi-annual literacy screening to determine whether they are on track toward reading proficiency. For students who are below benchmark, schools must inform parents and develop individual reading-improvement plans grounded in evidence-based instructional practices.

High school pathways. Starting in 2017, the Baker administration launched two parallel initiatives to establish early-college and early-career pathways, providing integrated courses of study for student cohorts in more than 100 high schools to deepen learning and engagement while strengthening college and career readiness. Both options are focused on improving outcomes for students who are underrepresented in higher education or high-demand industries.

Vocational and technical education. An interagency Workforce Skills Cabinet committed more than $200 million to upgrade equipment and technical lab spaces in vocational schools, comprehensive high schools, community colleges, and nonprofit training centers. In addition to creating new “reskilling and upskilling” capacity for workers and adult learners, these investments also enabled vocational enrollment to grow by close to 8,000 students (about 15 percent) since 2015, even though overall high school enrollment was flat.

Educator diversity. A central focus of the state Department of Elementary and Secondary Education is the recruitment, support, and retention of teachers of color. With the support of targeted grant programs and state-local partnerships, the number of Black and Latino teachers has increased by more than 30 percent since 2015, even as the total number of teachers has remained constant.

Unlike the earlier generation of policy reforms, these programmatic initiatives are not perceived as threatening to local autonomy and are generally met with enthusiasm by educators, students, and parents—as well as legislators on both sides of the aisle. Strategies like high-dosage tutoring, vacation and summer learning opportunities, and incentives for adoption of evidence-based curriculum and professional development could probably be added to this list. Equally important is the identification of other initiatives that could make an impact. Federal and state education agencies should partner with researchers to independently and rigorously evaluate promising programs and interventions.

Hope and Pragmatism

Execution, of course, is always the challenge, especially on a large scale, but these strategies offer hope for meaningful change at the classroom level, promising to move us closer to universal reading proficiency by 4th grade, create more equitable and inclusive classrooms, and provide a more engaging and purposeful high school experience.

If efforts like these prove successful and continue to gather momentum—especially across two gubernatorial administrations representing both major political parties—there is hope that they can be sustained over time to achieve statewide scale.

This is not an argument for abandoning other approaches to reform that operate closer to the margins of the dominant system, including charter schools, parental choice, and tech-enabled innovation. Any long-term school improvement plan, if it is to succeed, must include a robust outside strategy that can work collaboratively and competitively with school districts—challenging and enabling them to accelerate change and providing alternatives when they don’t. State policymakers must ensure that education entrepreneurs are supported and encouraged to play an ever-larger role in the public education ecosystem, especially for communities and student populations that have long been underserved or ignored.

By regaining traction on overall student performance and making progress on stubborn inequities, the programmatic initiatives described above, and others like them, might also help reinforce the value of the underlying standards-based reform architecture, helping to demonstrate its relevance, three decades after being enshrined in statute.

Perhaps just as important, renewed educational progress might help refocus politicians, media, and the broader public on the day-to-day work of schools, which has been overshadowed lately by the din of the culture wars. There is no way for schools to be fully insulated from these increasingly vitriolic and often hyperbolic ideological clashes; after all, schools play a central role in raising our children. But what gives these issues oxygen at school board meetings, state houses, and on social media is the growing sense on both the right and the left that schools are part of the problem and therefore not to be trusted.

From the left, schools are charged with being the perpetrator of the school-to-prison pipeline. From the right, schools are seen as a training ground for social justice warriors. Unfortunately, the “silent majority” in the middle mostly sits on the sidelines, in part out of fear of being ostracized by their angry neighbors and in part because many of them have lost confidence in the ability of our school system to deliver on its core educational mission—a perspective that was exacerbated by remote learning during the pandemic.

Over the past 30 years or more, education reformers have tried to “fix” a “broken” system of public schools. Although real progress has been made, the work is not even close to being done. By making the bold promise to “leave no child behind,” we helped to turn what should have been a positive story into a narrative of failure. Without a new, more pragmatic plan to achieve meaningful and sustainable improvement that both students and parents can recognize in their own schools, we risk losing the gains that we’ve made.

James A. Peyser served as secretary of education for Massachusetts from 2015–2022 and as chairman of the state board of education from 1999–2006.

This article appeared in the Fall 2023 issue of Education Next. Suggested citation format:

Peyser, J.A. (2023). Settle for Better: How overpromising undercut the education reform movement, and what to do about it. Education Next, 23(4), 44-49.

The post Settle for Better appeared first on Education Next.

]]>
49716719
Wisconsin’s Act 10, Flexible Pay, and the Impact on Teacher Labor Markets https://www.educationnext.org/wisconsin-act-10-flexible-pay-impact-teacher-labor-markets/ Tue, 25 Apr 2023 09:00:45 +0000 https://www.educationnext.org/?p=49716551 Student test scores rise in flexible-pay districts. So does a gender gap for teacher compensation.

The post Wisconsin’s Act 10, Flexible Pay, and the Impact on Teacher Labor Markets appeared first on Education Next.

]]>

Illustration

Effective teachers are a vital input for schools and students. Teachers can have important and long-lasting impacts on students’ learning, college attendance, and eventual earnings. They can also reduce teen pregnancy or incarceration. Attracting effective teachers into public schools and retaining them is thus a first-order policy goal. Changes in teacher compensation, for example across-the-board raises in salaries or pay plans that directly tie salaries to performance, are often proposed as ways to achieve this goal. The debate on these reforms, though, is very much open; some opponents argue that these changes would be ineffective because teachers are not motivated by money.

Empirical evidence on the effects of compensation reform is somewhat scarce. Most U.S. public school teachers are paid according to rigid schedules that determine pay based solely on seniority and academic credentials. In unionized school districts, these schedules are set by collective bargaining agreements. The near absence of variation in pay practices has prevented rigorous evaluation of the impacts of changes in the structure of teacher pay on the supply of effective teachers and on students’ success.

The dearth of variation in pay schemes was broken in 2011 when the Wisconsin state legislature passed Act 10. Intended to help address a projected $3.6 billion budget deficit through cuts in public-sector spending, Act 10 introduced several changes concerning teachers’ unions, school districts, and their employees. First and foremost, Act 10 limited the scope of salary negotiations to base pay, preventing unions from negotiating salary schedules and including them in collective bargaining agreements. This allowed school districts to set pay more flexibly and without unions’ consent, in principle detaching compensation from seniority and credentials. Act 10 also capped annual growth in base pay to the rate of inflation and required employees to contribute more towards their pensions and health care plans. Lastly, the new legislation made it harder for unions to operate. It requires local union chapters to recertify every year with support from the absolute majority of all employees they represent, and it prohibits automatic collection of union dues from employees’ paychecks.

The public debate over Act 10 has focused on whether the reform package was good or bad for students, schools, and teachers. The unions vigorously opposed the legislation, organizing protests and occupying the state capitol building. Republican Governor Scott Walker just as vigorously defended the legislation, which helped propel him to national prominence. For education policy scholars, however, what is undeniable is that the legislation was useful, because its implementation offered an opportunity to study its effects. In a series of studies, I have taken advantage of the changes to teachers’ labor markets introduced by the reform to shed light on the impact of flexible pay on teachers’ mobility and effectiveness, the gender wage gap among teachers, and whether most teachers would prefer higher salaries today versus more generous pensions when they retire.

Learning from Act 10

The provisions of Act 10 went into effect immediately. In practice, though, school districts acquired the power to use their newly acquired flexibility not simultaneously, but at different points in time. The two-year collective bargaining agreements reached between each district and its teachers union prior to 2011 remained valid until their expiration, and districts had been on different negotiation calendars starting from several years prior to Act 10. As a result, the timing of expiration was staggered across districts for reasons that were effectively random. This variation creates an opportunity to examine the impact of the end of collective bargaining over teacher pay.

Districts were free under Act 10 to decide whether and to what extent to use their newly gained flexibility to depart from salaries based only on seniority and academic credentials. To characterize these choices, I analyzed districts’ post-Act 10 employee handbooks, documents which list the duties and rights of all teachers and describe how they are paid. As of 2015, approximately half of all districts still included a salary schedule in their handbook and did not mention any other bonuses or increments; I call these seniority-pay districts. The remaining districts, on the other hand, did not list any schedule and often clearly stated that individual pay would be set as the district saw fit; I call these flexible-pay districts.

Using employment records on all public-school teachers in Wisconsin linked to individual student information on achievement and demographics from the Wisconsin Department of Public Instruction, I first document how teacher salaries changed in flexible-pay and seniority-pay districts in the aftermath of the reform. After the expiration of districts’ collective bargaining agreements, salary differences among teachers with similar seniority and credentials emerged in flexible-pay districts, but not in seniority-pay districts. Before the passage of Act 10, such teachers would have been paid the same. These newly emerging differences are related to teachers’ effectiveness: Teachers with higher value-added (individual contributions to the growth in student achievement, as measured by standardized test scores) started earning more in flexible-pay districts. This finding is striking considering that school districts in Wisconsin neither calculate value-added nor use it to make any human-resources decisions. School and district administrators appear to be able to identify an effective teacher when they see one.

Does Flexible Pay Attract Better Teachers?

Changes in teachers’ pay arrangements after the expiration of the collective bargaining agreements changed teachers’ incentives to stay in their district or to move, depending on the teachers’ effectiveness and the pay plan in place in their district of origin. Because flexible-pay districts compensate teachers for their effectiveness and seniority-pay districts only reward them for seniority and academic credentials, teachers with higher effectiveness should want to move to flexible-pay districts, whereas teachers with lower effectiveness and higher seniority should want to move to seniority-pay districts.

The data confirm these hypotheses. The rate of cross-district movement more than doubled after Act 10, with most moves occurring across districts of different type (flexible-pay vs. seniority-pay). Teachers who moved to a flexible-pay district after a collective bargaining agreement expired were more than a standard deviation more effective, on average, than teachers who moved to the same districts before the expiration; these teachers also had lower seniority and academic credentials and enjoyed a significant pay increase upon moving. The effectiveness of teachers moving to seniority-pay districts, on the other hand, did not change. and these teachers did not experience any change in pay.

In addition to inducing sorting of teachers across districts, Act 10 led some teachers to leave the public school system altogether: The exit rate nearly doubled in the immediate aftermath of the reform, to 9 percent from 5 percent. Again, the characteristics of those who chose to leave differed depending on the pay plan each district chose after its collective bargaining agreement expired. Teachers who left flexible-pay districts were far less effective than those who left seniority-pay districts.

Changes in the composition of movers and leavers after collective bargaining agreements expired produced a 4 percent of a standard deviation increase in ex ante (i.e., measured pre-reform) teacher effectiveness in flexible-pay relative to seniority-pay districts. In flexible-pay districts, the effectiveness of teachers who did not move or leave also increased immediately after the reform, compared with teachers in seniority-pay districts, suggesting that teachers in flexible-pay districts increased their effort (Figure 1). Overall, changes in the composition and effort of the teaching workforce led to a 5 percent of a standard deviation increase in student test scores in flexible-pay districts relative to seniority-pay districts in the five years following the reform.

Figure 1: Post-Act 10, Teachers Increase Effort in Flexible-Pay Districts

Taken together, these results suggest that higher pay can be an effective tool to attract and retain talented teachers.

It is worth stressing, though, that part of the gains enjoyed by flexible-pay districts came at the expense of seniority-pay districts, with implications for inequality in the allocation of teachers across students. Whether flexible pay undermines equity depends on which districts adopt flexible pay, which is in turn related to the characteristics of the districts’ students, the pool of teachers they employed pre-reform, and their budgets. For example, to attract its most preferred teachers under flexible pay, a district with a smaller budget and a larger share of economically disadvantaged students may have to pay too high a premium, which it cannot afford. The district may thus decide to stay with seniority pay to at least be able to fill its teaching slots.

In a separate study, Chao Fu, John Stromme, and I use post-Act 10 data from Wisconsin to explore this possibility. We conclude that a switch from rigid to flexible pay (like the one that occurred in Wisconsin after the reform) could reduce disadvantaged students’ access to more effective and therefore in-demand teachers. We also show, however, that properly designed bonus programs that redistribute state funds to districts serving large numbers of disadvantaged students could offset this effect.

More Pay for Male Teachers

An additional caveat for a pay approach that gives districts flexibility over teacher pay is that it may produce wage inequality across teachers with similar effectiveness but different demographic characteristics—for example, men and women. A pay plan that allows employers to adjust workers’ pay at the individual level introduces the opportunity for individual negotiations. However, research suggests that women are often reluctant to negotiate for higher pay, giving an advantage to men and creating or exacerbating gender pay gaps.

To test whether this dynamic emerged among Wisconsin teachers after Act 10, Heather Sarsons and I compare the salaries of male and female teachers with the same demographic profile, with the same seniority and academic credentials, and who teach in the same district, grade, and subject. We make these comparisons before and after the expiration of each district’s post-Act 10 collective bargaining agreement to see how the law affected gender equity. Prior to the passage of Act 10, strict adherence to seniority-based salary schedules meant that there was no gender wage gap among Wisconsin teachers. With the advent of flexible pay, though, a gender gap emerged that penalizes women (Figure 2). While small on average, the gap is larger for younger and less experienced teachers. If this gap were to persist over time, women would lose an entire year’s pay relative to men over the course of a 35-year career.

Figure 2: Gender Wage Gap Emerges after Pay Reform

The gender wage gap associated with flexible pay also differs depending on the gender of school and district leaders. In schools with a female principal or districts with a female superintendent, the gap is virtually zero. In schools and districts run by men, the gap is substantial.

The emergence of a gender wage gap following the introduction of flexible pay suggests that gender differences in teachers’ willingness to bargain or their bargaining ability could be driving part or all of it. To shed light on bargaining’s role, we surveyed all current Wisconsin public school teachers. We asked respondents whether they have ever negotiated their pay or plan to do so in the future. We then asked teachers who declined to negotiate why they chose to do so. We asked those who did bargain whether they believed the negotiation was successful.

Survey responses indicate that women are systematically less likely than men to have negotiated their pay at various points in their careers or to anticipate negotiating in the future. The magnitude of the differences is substantial, suggesting that differences in bargaining could lead to a gender wage gap as large as 12%. In line with our wage results, gender differences in negotiating behavior are entirely driven by men being more likely to bargain under a male superintendent, whereas men and women who work under a female superintendent are equally likely to negotiate their salaries. When asked why they did not negotiate, women are 31% more likely than men to report that they do not feel comfortable negotiating pay. Differences in the perceived returns to bargaining and beliefs about one’s teaching ability do not explain why women are less likely to negotiate.

In short, our survey data point to gender differences in bargaining as a likely determinant of the gender wage gap. We also test for, and rule out, three additional explanations. The first is the possibility of gender differences in teaching quality: As districts use wage flexibility to pay higher salaries to more effective teachers, a gender gap could emerge if men are better teachers than women. Our data do not support this hypothesis: women’s value-added is slightly higher than men’s and controlling for it does not affect the gap. Furthermore, the returns to having high value-added after the introduction of flexible pay are positive for men, but not for women. A second possible explanation is job mobility. If women are less likely than men to move, they might be unable to take advantage of outside offers with higher pay. In our data, however, women are as likely as men to move. The third possible explanation is higher demand for male teachers from certain schools, for example those employing fewer men, those that lost male teachers immediately before Act 10, and those enrolling a higher share of male students. While the gender wage gap is larger in such schools, these differences only explain a very small portion of the total gap. Taken together, our results highlight how flexible pay, while possibly beneficial to attract effective teachers and incentivize all teachers to exert more effort, can be detrimental for some subgroups.

How Much Do Teachers Value their Pensions?

To date, most of the debate on how to design teacher pay to improve selection and retention has focused on salaries—that is, the compensation that teachers receive while active in the labor force. Yet, almost all U.S. public school teachers receive a large portion of their lifetime compensation in the form of defined-benefit retirement pensions.

Pension benefits are typically calculated using a formula that multiplies years of service, average salary over the final several years of the teacher’s career, and a “replacement factor” (e.g., 2.5 percent). On one hand, this makes pensions very generous for career teachers and thus extremely onerous for state budgets, to the point that the pension liabilities of current public-sector employees (approximately half of whom are teachers) were fully funded in only two states in 2018. Reforms to increase the solvency of these plans have thus been debated for years across many states. On the other hand, the use of defined-benefit plans implies that any changes to the structure and growth of teachers’ pay—especially towards the end of the career—would translate into changes in pension benefits.

To fully appreciate how salaries and pension reforms would affect the composition of the teaching workforce, it is crucial to understand how teachers value higher salaries vis à vis generous pensions. The multiple provisions of Act 10, which changed teachers’ salaries and future pension benefits with a staggered timing across districts, also allow me to study this question. First, as mentioned above, the legislation introduced flexible pay across districts after the end of each collective bargaining agreement. For the subsample of teachers already eligible to retire (those who are at least 55 years old and have at least five years of service), who enjoyed the most generous salaries before Act 10 because of salary schedules that rewarded seniority, this led to a 7.5 percent decline in gross salaries. Importantly, since pension benefits are calculated using a defined-benefit formula, this decline also translated into a 5.8 percent decline in future pension benefits for the average retirement-eligible teacher.

Second, Act 10 raised employees’ contributions to their pension plan from zero to approximately 6 percent of annual salaries, lowering employer contributions by the same amount (so that the total per worker contribution remained the same). Akin to the levy of a payroll tax, his provision lowered net salaries for all teachers and took place starting from 2012 in all districts.

To estimate the impact of these changes in compensation on teachers’ decisions about whether to remain in the classroom, I track teacher retirement rates across districts as these two provisions of the reform went into effect. Overall, retirement (defined as the share of teachers eligible to claim a pension, which in Wisconsin are those aged 55 and above with 5 or more years of service, who leave at the end of the year) rose to 34% from 15% after Act 10. The staggered timing of the changes’ implementation allows me to separate responses to changes in net salaries (due to the increase in contribution rates) from responses to changes in gross salaries and pension benefits (due to the introduction of flexible pay). I find that approximately 45% of the increase in retirement can be attributed to the decline in net salaries, whereas 55% can be ascribed to the fall in gross salaries and pension benefits.

Next, I test whether teachers’ response to a decline in salaries is equivalent to their response to the same decline in pension benefits, or if teachers instead react more strongly to changes in either form of compensation (which would be consistent with them having stronger preferences for it). The data reveal that teachers respond more to changes in current salaries than they do to equivalent changes in the value of their future pension benefits. This finding has an important implication for the design of teachers’ compensation schemes: shifting part of their lifetime compensation away from retirement towards employment (i.e., raising salaries and making pensions less generous) could significantly improve teacher retention.

Act 10’s Lessons

In sum, Act 10 offered a unique opportunity to understand what would happen to the teacher labor market if it were to become more similar to “standard” labor markets in terms of pay. This reform is still relatively recent; its long-run effects on the public education system in Wisconsin remain to be seen. In particular, careful study of its effects on the selection of new teachers and entry in the profession represents an important avenue for future research.

Taken together, however, the results of the studies conducted to date highlight how reforms of the structure of teachers’ pay can be a powerful instrument to attract and retain effective educators, which could have profound and long-lasting effects on students. Giving school districts autonomy over the design of pay and limiting the rigidity embedded in the use of seniority-based salary schedules can help administrators attract more effective teachers from other school districts—and, presumably, from outside of education. Yet, some of the findings call for caution when re-designing teachers’ pay arrangements: Flexibility can generate inequities across students in the effectiveness of their teachers, and across male and female teachers in the pay they receive.

Barbara Biasi is an Assistant Professor at Yale SOM and a Visiting Assistant Professor at the Einaudi Institute for Economics and Finance. She is also a Faculty Research Fellow at NBER and a Research Affiliate at CEPR and CESifo.

This article appeared in the Summer 2023 issue of Education Next. Suggested citation format:

Biasi, B. (2023). Wisconsin’s Act 10, Flexible Pay, and the Impact on Teacher Labor Markets: Student test scores rise in flexible-pay districts. So does a gender gap for teacher compensation. Education Next, 23(3), 26-31.

The post Wisconsin’s Act 10, Flexible Pay, and the Impact on Teacher Labor Markets appeared first on Education Next.

]]>
49716551
PISA: Mission Failure https://www.educationnext.org/pisa-mission-failure-with-so-much-evidence-student-testing-why-do-education-systems-struggle/ Tue, 07 Feb 2023 10:00:11 +0000 https://www.educationnext.org/?p=49716248 With so much evidence from student testing, why do education systems continue to struggle?

The post PISA: Mission Failure appeared first on Education Next.

]]>

Illustration

In the contentious world of education, nearly every proposed reform has its detractors and supporters. Yet common sense might indicate that a policy backed by solid evidence would foster agreement between policymakers, governments, political parties, and education stakeholders. Shouldn’t objective data override ideological divides and political bickering?

Many reformers have looked to assessment and accountability, both within countries and internationally, as a means of encouraging consensus. On the global scene, their hope was that the evidence generated by international assessments could contribute to our common understanding of what works in different countries, since comparative data can identify which policies have boosted student achievement in top-performing nations.

Unfortunately, these expectations have not been met.

Since 2000, the Programme for International Student Assessment, or PISA, has tested 15-years-olds throughout the world in reading, math, and science. Developed by the Organization for Economic Cooperation and Development, or OECD, and administered every three years, PISA is designed to yield evidence for governments on which education policies deliver better learning outcomes as students approach the end of secondary school. The OECD is a member-led organization of nations that provides policy advice to governments and encourages peer learning between countries. Initially, PISA testing involved only the rather homogeneous group of OECD member countries, but its ambition grew. From the first cycle (2000) to the last (2018), the number of participating countries increased from 32 to 79, owing largely to the addition of many low- and middle-income countries. At this point the OECD asserted that “PISA has become the world’s premier yardstick for evaluating the quality, equity and efficiency of school systems, and an influential force for education reform.”

And yet, according to PISA’s own data, after almost two decades of testing, student outcomes have not improved overall in OECD nations or most other participating countries. Of course, that same time period saw a global recession, the rise of social media, and other developments that may have served as headwinds for school-improvement efforts. Even so, PISA’s failure to achieve its mission has led to some blame games. In an effort to explain the flatness of student outcomes over PISA’s lifetime, the OECD asserted in a report on the 2018 test results that PISA “has helped policy makers lower the cost of political action by backing difficult decisions with evidence—but it has also raised the political cost of inaction by exposing areas where policy and practice are unsatisfactory.” The OECD was essentially pointing the finger at its own members and other countries participating in PISA, accusing them of not following PISA’s policy advice.

This finger pointing is based on two assumptions: first and foremost, that PISA policy recommendations are sound, and second, that the evidence provided by PISA data is itself enough to reduce the political costs associated with implementing education reforms.

Both assumptions are seriously flawed. My professional experience as an academic and national education minister allows me to look at this issue from a unique vantage point. When I served as Spain’s secretary of state for education, I became keenly aware of the political pushback that education reforms face, how and why that pushback remains hidden from public debate, and the helplessness policymakers feel when they try to ameliorate differences of opinion by bringing objective evidence to the table. As deputy director for education at the OECD and later head of its Centre for Skills, I enjoyed the privilege of providing advice to governments all over the world, which allowed me to observe how much the success of specific policies and the magnitude of the political costs associated with implementing those policies differ between countries.

PISA has proven to be a successful metric for comparing education systems, a challenge that many thought impossible. The fact that the PISA ranking of countries by student performance is similar to the rankings generated by other international assessments has been used both to argue that PISA is robust and to question the need for another test. But PISA is different, mainly because, within the OECD framework, its role was predefined as a tool for policy advice, and it enjoys the privilege of direct communication channels with governments. Unlike the sponsors of other assessments, PISA officials work tirelessly to enhance the program’s media impact, a strategy that has two closely linked objectives: to magnify PISA’s visibility and to put pressure on governments to follow its recommendations. Clearly, PISA has a better chance of achieving these goals when exposed weaknesses in an education system provoke a media furor. Program officials seem particularly proud of the “PISA shock” that occurs when unexpectedly poor results in a country lead to media outrage. This happened in Germany in the first PISA cycle, and, as the OECD wrote in a 2011 report, “the uproar in the press reflected a very strong reaction to the PISA results. . . . Politicians who ignored it risked their careers.”

Politicians around the world do view PISA as a high-stakes exam that leads to intense media scrutiny and political blame games. But surely the only measure that truly reflects PISA’s success is its ability to shape reforms that improve student outcomes. As we have seen, trends over time reveal a flat line, so what went wrong?

Quality and Equity

Policy recommendations from PISA are based on a combination of two different approaches: 1) quantitative analyses that search for links between student outcomes and a range of features of education systems and 2) qualitative analyses of low- and top-performing countries. Many critics have noted that PISA’s quantitative analyses cannot be used to draw causal inferences, mainly because of the cross-sectional nature of the samples and the almost-exclusive use of correlations. Meanwhile, its qualitative analyses also suffer from serious drawbacks such as cherry-picking. While these issues are well known, others have gone largely unnoticed.

PISA seeks to measure two complementary dimensions of education systems: quality and equity. While quality is typically measured in a straightforward way—that is, in terms of average student test scores—equity is a multidimensional concept that PISA measures using metrics such as the relationship between socioeconomic status and student performance, the degree of differences in student performance within and between schools, and many others. The problem is that none of these variables tell the full story, each of them leads to different conclusions, and PISA’s prism on equity is ultimately too narrow.

To illustrate this point, I turn to my own country, Spain. From the very first cycle, PISA has hailed the Spanish school system as a paragon of equity. In fact, the praise has gone as far as to suggest that Spain has prioritized equity over excellence, a choice that PISA officials have applauded and domestic policymakers have used as an alibi to downplay the poor overall performance of Spanish students. PISA deems Spanish education to be equitable based on the finding that most of the variance in student performance in the country occurs within rather than between schools, a result it interprets as revealing no major differences between neighborhoods based on wealth or between schools based on their selectivity. But there is an alternative interpretation: The equity metric that PISA has chosen to highlight is not appropriate in a country with high rates of grade repetition. Variation within schools is large because PISA tests 15-year-olds irrespective of their grade level. That means that Spain tests a large proportion of students who are one or several grades behind because they have repeated grades at least once. The additional problem is that focusing on a single variable while ignoring the bigger picture leads to mistaken conclusions. Grade repetition in Spain is a reliable proxy for early school leaving, which, in turn, leads to a high rate of youth unemployment and a large number of individuals who are not in school, the workforce, or training.

Unfortunately, in Spain the dropout rate has hovered around 30 percent for decades, and when I became secretary of state for education in 2012, at the peak of the financial crisis, the rate of youth unemployment was above 50 percent. It is simply wrong to define as equitable an education system where nearly one in every three students (most of them disadvantaged students or migrants) drops out of school without a minimum level of knowledge and skills.

Labeling Spain’s school system as equitable is not an isolated case of misdiagnosis, since PISA also defines as equitable the education systems in countries such as Brazil, China, Mexico, and Vietnam, where a substantial proportion of 15-year-olds do not attend school, either because they never did or because they dropped out. It is mistaken to suggest that lessons about equity can be drawn from these countries.

The Organisation for Economic Cooperation and Development (OECD) headquarters in Paris is where PISA was developed to assess students in reading, math, and science.
The Organisation for Economic Cooperation and Development (OECD) headquarters in Paris is where PISA was developed to assess students in reading, math, and science.

Wrongheaded Recommendations

These mistakes mean PISA incorrectly identifies the countries that should serve as role models, but what really matters is the policy recommendations PISA develops after comparing many countries. In a nutshell, out of concern for equity, the program warns against the implementation of any measures that could lead to segregation, such as ability grouping, school choice, and early tracking. This advice seems to be influenced more by ideology than evidence, since none of PISA’s own statistical analyses justify such recommendations.

Consider the case of vocational education and training. PISA’s conclusion is that it lowers student performance in the subjects tested by the program—reading, math, and science; thus, PISA’s recommendation is to postpone vocational education until upper secondary school to minimize the harm. However, the vast majority of participating countries already follow this practice, stipulating that students cannot choose vocational education until the age of 16. Since PISA assesses 15-year-olds, the number of vocational students it tests in most countries is zero. In those few countries where students follow different tracks at younger ages, the results do not always support the conclusion that vocational students perform less well. Thus, PISA is poorly positioned to provide policy recommendations on this topic.

Another questionable policy recommendation from PISA concerns school choice, about which the OECD concludes that, after correcting for socioeconomic status, students do not perform better in private schools than in public schools. These analyses, however, lump private schools together with government-funded, privately managed charter schools, thus making it impossible to draw separate conclusions about charter schools, which in many countries are the real target of controversy. More elaborate analyses using data from many international assessments, as well as other studies, have concluded that school choice often does lead to better student outcomes without necessarily generating segregation and that some of the few countries with early tracking show little (if any) differences in student performance and employability rates for vocational-education students. PISA needs to pay more attention to academic research and look at the broader picture.

PISA’s qualitative analyses rely heavily on differences between Nordic countries and others. In particular, the sharp contrast in PISA’s first cycle between the unexpected success of Finland and the unexpected poor performance of Germany has crystallized into an influential legend: that inclusive policies in place in Finland at the time led to both quality and equity and should be emulated, while the heavily tracked system in Germany led to inequity and should be avoided. Nordic societies were egalitarian long before PISA started, however. The alternative explanation is that in egalitarian societies teachers deal with a rather uniform student population, and therefore these countries can, without much risk, implement inclusive policies that tend to treat all students similarly. In contrast, less-egalitarian societies may require differentiated approaches and policies to meet the challenges that come with a heterogeneous student population. A number of comparative analyses show a correlation between the degree of economic inequality and the extent of disparities in student outcomes. Thus, the right question to ask is: To what extent can education systems compensate for large social, economic, and skills inequalities, and how?

I will return briefly to Spain which, compared to Nordic countries, is a rather inequitable society, not just economically but also in terms of skills. According to the OECD’s Survey of Adult Skills, adults in Spain have low skill levels compared to their counterparts in most European countries. What’s more, because in Spain universal access to education came about relatively late and the dropout rate has been high for decades, older Spaniards have very low skill levels, as do the relatively large proportion of adults of all ages who dropped out of school early. Among populations with such a lopsided distribution of skills, children entering school have very different starting points, levels of support at home, and access to resources. For teachers to be effective, it may be necessary to adopt practices that reduce student heterogeneity through the use of ability grouping or, in more extreme cases, different tracks. If these measures are not implemented early enough, students who are behind when they start school may not be able to catch up with their peers and, as they lag farther and farther behind, may end up repeating grades. In the 1990s, Spain implemented a rather radical comprehensive reform that delayed the start of the vocational-education track by two years (moving the starting age from 14 to 16) and avoided any differential treatment of students until the age of 16. This system was designed, as the OECD recognizes, for the sake of equity. But it failed: early school leaving increased as 14-year-olds no longer had a vocational option.

Latin America is also a region where levels of inequality are very high. Most countries there follow the egalitarian rules (no early tracking, no ability grouping, almost-nonexistent vocational education), leading to poor educational outcomes: low student achievement and high rates of grade repetition and early school leaving. In these countries, more than 70 percent of teachers and principals report that broad heterogeneity in students’ ability levels within classrooms is the main barrier to learning.

Political Pushback

These examples point to a broader conclusion: policy recommendations cannot be universal, because what works in egalitarian societies may lead to bad outcomes in societies with high levels of inequity. Education systems should instead follow a sequence of steps as they mature. Singapore shows the way. A few decades ago, Singapore had an illiterate population and very few natural resources. The country made a decision to invest in human capital as the engine of economic growth and prosperity, and, in a few decades, it became the top performer in all international assessment programs, thanks to an excellent and evolving education system (see Figure 1). But PISA does not draw any lessons from the fact that Singapore started improving by implementing tracking in primary school in an effort to decrease its high dropout rate. Once this was achieved, the country delayed tracking until the end of primary school. Even today, however, Singapore remains one of the few countries with early tracking, along with Austria, Germany, the Netherlands, and Switzerland.

Singapore is one of the education superpowers of East Asia, a group that also includes South Korea, Japan, Taiwan, Hong Kong, and certain regions within China. While Finland was PISA’s top performer in reading in the first cycle (when a small number of countries was compared), student outcomes in that country have since declined. In contrast, these East Asian countries consistently outperform other nations—particularly in math and science—and their extraordinary outcomes continue to improve. The comparison between this group and the low-performing countries in Latin America (that is, countries on the opposite poles in the PISA rankings) is useful in examining PISA’s second assumption: that the evidence provided by PISA data is itself enough to minimize the political costs associated with implementing education reforms.

Teacher quality is widely recognized as key to both the success of East Asian countries and the failure of Latin American countries. In East Asia, only top-performing students can enter education-degree programs, and, throughout their careers, teachers continue to develop their skills via demanding professional-development pathways. This emphasis on teachers’ lifelong learning means that they spend less time in the classroom, a trade-off that leads to large class sizes. In contrast, in Latin American countries, students in education-degree programs are academically weak, selection mechanisms to enter the profession are ineffective, and accountability mechanisms are almost nonexistent. As a result, teachers tend to have high levels of skills in East Asian countries and weaker skills in Latin American countries.

There is widespread recognition that the main constraint to raising teacher quality in Latin America is political. Unions in the region are very powerful by global standards, and they put huge pressures on governments to defend their interests, among which small class size is prominent. Smaller classes mean more teachers and more union members. A larger membership results in greater monetary resources and the increased power that comes with them. In contrast, union power in top-performing East Asian countries is very weak. This crucial difference is what makes the implementation of certain policies (such as large class sizes or rigorous teacher training and stricter selection mechanisms) very costly in political terms in Latin America, while such political costs barely exist among top-performing countries in East Asia.

The evidence from PISA on class size is one of the most robust results about what does not work in education. Decreasing class size uses up a vast amount of resources and seems to have no impact on student performance at the system level, so PISA’s policy recommendation has been to increase class size. However, many countries (including OECD members) have not acted on this evidence-based recommendation. They have continued to reduce class size over time because of the huge political costs of not doing so. Most increases in education spending have therefore gone toward a strategy that has no impact on student outcomes. This example suggests that evidence, no matter how robust, is unlikely
to diminish the high political costs associated with reforms that result in the redistribution of the vast resources (and power) that education systems command.

PISA seems to misunderstand the nature of the political costs that reformers face. Those who oppose change are not resisting it because they haven’t been convinced of the merits of the reforms. Evidence won’t change their position. Decreases (or lack of increases) in investment generate a head-on conflict with the vested interests of unions and other stakeholders that will strongly oppose policies that reduce the resources these players receive. These vested interests tend to be hidden in the political debate, since pressures to decrease class size in order to increase the number of teachers are often presented as attempts to improve the quality of education.

Students in London sit for their PISA exams in 2017. Although the United Kingdom was among the top-scoring countries, Asian nations like China and Singapore performed better.
Students in London sit for their PISA exams in 2017. Although the United Kingdom was among the top-scoring countries, Asian nations like China and Singapore performed better.

Mistaken Assumptions

In conclusion, PISA’s two assumptions—that PISA’s policy recommendations are right and that the evidence provided by PISA data is enough to minimize the political costs of attempting education reform—are flawed. First, some of PISA’s conclusions are based on weak evidence. The greater problem, though, is that most policy recommendations are strongly context-dependent, and PISA’s recommendations may be difficult for policymakers to interpret correctly if they lack precise knowledge of their education system’s state of maturity. Ignoring this fact and making universal policy recommendations has dire consequences for many countries, particularly those most in need. It would be much more helpful for PISA to look at countries that have achieved gains and try to extract lessons for other countries that had similar starting points when they joined PISA but have not improved.

Policymakers should remain aware, though, that reforms cause intense clashes of interest when resources are redistributed. That is especially the case when powerful unions are among the losers. Evidence has nothing to do with the nature of such conflicts. Those reformers who have tried and failed when confronted with such huge political costs need better advice from PISA, not a reprimand.

Montse Gomendio is a research professor at the Spanish Research Council. Formerly, she served as Spain’s secretary of state for education, as OECD’s deputy director for education, and as head of its Centre for Skills. She is co-author of Dire Straits: Education Reforms, Ideology, Vested Interests and Evidence (2023).

This article appeared in the Spring 2023 issue of Education Next. Suggested citation format:

Gomendio, M. (2023). PISA: Mission Failure – With so much evidence from student testing, why do education systems continue to struggle? Education Next, 23(2), 16-22.

For more, please see “The Top 20 Education Next Articles of 2023.”

The post PISA: Mission Failure appeared first on Education Next.

]]>
49716248