For all the speculation about ChatGPT’s potential to upend K–12 writing instruction, there has been little investigation into the underlying assumption that the AI chatbot can produce writing that makes the grade.
We put OpenAI’s ChatGPT to the test by asking it to write essays in response to real school curriculum prompts. We then submitted those essays for evaluation. The results show that ChatGPT produces responses that meet or exceed standards across grade levels. This has big implications for schools, which should move with urgency to adjust their practices and learning models to keep pace with the shifting technological landscape.
Background
When it burst onto the scene in November 2022, ChatGPT’s clear and thorough written responses to user-generated prompts sparked widespread discussion. What it might mean for K–12 education was one area of speculation. Some worried about the potential for plagiarism, with students dishonestly passing off computer-generated work as their own creative product. Some viewed that threat as particularly formidable, pointing to three attributes that make ChatGPT different from past tools. First, it generates responses on-demand, meaning that students can receive a complete essay tailored to their prompt in a matter of seconds. Second, it is not repetitive. It tends to answer multiple submissions of the same prompt with responses that are distinct in their arguments and phrasing. And third, its output is untraceable, as it is not stored in any publicly accessible place on the Internet.
Education decision makers are already moving to respond to this new technology. In January, the New York City Department of Education instituted a ban on ChatGPT by blocking access to it on all its devices and networks. Los Angeles, Oakland, Seattle, and Baltimore school districts have imposed similar prohibitions. As leaders in other districts, schools, and classrooms grapple with if, when, and how to make changes in response to this technology, they need a read on how well ChatGPT, in its present form, can deliver on the threat it is purported to pose.
To help answer this question, we took three essay prompts per grade level from EngageNY’s curriculum for grades 4 through 12, which are the grades in which students produce long-form essays. For each grade level, the three essay prompts covered the three main types of writing —persuasive, expository, and narrative—that students do. The tasks ranged from creating a choose-your-own-adventure story about an animal and its defense mechanisms to selecting a central idea common to Robert Browning’s poem “My Last Duchess,” William Shakespeare’s Hamlet, and an excerpt from Virginia Woolf’s essay “A Room of One’s Own” and explaining how the texts work together to build an understanding of that idea. We then asked ChatGPT to produce an essay response in the voice of a student from the respective grade level. With the essays in hand, we commissioned a K–12 grading service to assess ChatGPT’s writing. The human graders evaluated each essay using rubrics from the Tennessee Department of Education that were tailored to the grade level and writing task. The graders assessed the essays across four categories of criteria—focus and organization, idea development, language, and conventions—and produced a numerical grade.
There is a caveat to our approach in answering this question. The way in which we asked ChatGPT to write its essays would likely differ from the way in which a discerning student using ChatGPT to plagiarize would use it. We provided little reworking of the essay prompts—just enough so that ChatGPT could understand them. Nor did we edit the chatbot’s output. In contrast, students would likely evaluate the AI’s writing and resubmit prompts that coached it toward a better product or edit its work to improve it where they saw fit. A student seeking to have ChatGPT write them an essay could likely take this cyborg approach to their assignment—presuming they did the work with more than just an hour of lead time—and earn higher grades than in our approach of using the chatbot on its own.
ChatGPT Clears the Bar
ChatGPT earned passing marks on each essay type at each grade level. It performed especially well in response to the writing tasks of the lower grades. Figure 1 depicts its average essay scores across the range of grades. Even at the upper grade levels, ChatGPT is a solid B or C student. Although its performance on high school prompts would not land it a spot at the valedictorian’s podium, it would still get it to the graduation stage, which has important implications for schools.
Figure 1
Figure 2 compares ChatGPT’s abilities across the four criteria categories: organization, idea development, language, and conventions.
Figure 2
ChatGPT’s strength in language and conventions show that it is a clear writer, capable of crafting fluent, grammatically sound prose. The chatbot either met or exceeded standards in both these categories for all 27 essays submitted.
The AI has the most room for improvement in its development of ideas. The graders’ written feedback reveals that it sometimes fails to support its claims with reasons or evidence and, in a few instances, makes assertions that are flat out false. It struggles the most to develop its ideas in response to literature. All five of the instances in which it earned a D+ —its lowest grade—were those in which the chatbot was asked to demonstrate its understanding of long-form prose, such as A Midsummer Night’s Dream or The Joy Luck Club. This calls into question if and how closely the chatbot has “read” these materials.
Now what?
These findings suggest that ChatGPT is already powerful enough that educators must change the status quo of writing instruction. Schools should evolve their practices, pedagogy, and policies to address the underlying forces that compel students to use technology like ChatGPT in counterproductive ways without resorting to blocks and bans that limit the use of this potentially powerful learning tool.
Use ChatGPT to Help Students Go Beyond its Capabilities
Some of the early actions taken in response to ChatGPT have focused on catching students misusing it or preventing them from accessing it altogether, such as the districts banning or blocking the AI. Elsewhere, teachers are now requiring students to do their writing with pen and paper in an effort to thwart copy-and-pasting from the chatbot. Developers are also creating new plagiarism detection software designed to identify AI-generated writing.
Each of these options comes with its own set of tradeoffs, but one drawback common to all of them is the cat-and-mouse dynamic they establish between schools and students. Each sends the message that students cannot be trusted with technology.
There’s another approach, however. By inviting ChatGPT into the classroom instead of locking it out, schools can push students toward independent thinking in a way that doesn’t signal mistrust. There are indications that, despite some of the high-profile bans, many teachers are thinking along these lines. According to a survey of more than 2,000 teachers commissioned by the Walton Family Foundation, 40 percent of teachers report already using ChatGPT at least once a week. Thirty-eight percent of teachers report allowing students to use ChatGPT, whereas only 10 percent say they have caught the students using it without their permission. And 72 percent say that “ChatGPT is just another example of why we can’t keep doing things the old way for schools in the modern world.”
One way to leverage ChatGPT is by intentionally using it in class. For example, teachers can have ChatGPT generate writing samples in response to different essay questions, which they can then use with their class to dissect the quality of the writing. Weighing the strengths and weaknesses of writing examples is a commonly used teaching tool. It can force students to think deeply about what constitutes good prose. Doing this with ChatGPT’s output has multiple potential benefits. Because ChatGPT can generate writing on-demand, teachers can produce samples tailored to the needs of their lesson without sinking valuable time into crafting the examples themselves. In addition, by using ChatGPT in this way, teachers can show students where the chatbot falls short and how they are expected to go above and beyond its capabilities. Doing so disincentivizes inappropriate use of ChatGPT—not by threatening punishment or imposing bans but by communicating high expectations.
Teachers can also use the essays that ChatGPT produces to help check that their own prompts are asking enough of students. If not, they may want to consider revising their assignments. This approach encourages students to think outside the bot and, in doing so, helps them build writing skills that cannot already be automated away.
Teachers can also allow students to bring the AI in as a writing aid for certain assignments. Students could use it to conduct research, refine their prose, and test their ideas to see if they make sense to ChatGPT. Some have characterized ChatGPT in this role as a calculator for writing. Like the calculator, the technology’s efficiencies and enhancements could be leveraged to push student work to levels of complexity and quality that would be otherwise outside the realm of possibility.
Flip the Classroom to Support Students Throughout Their Writing Process
In a traditional classroom, students sit in whole-class instruction during school hours and do most of their writing assignments at home. Under this model, students who do not master the skills taught in class have little recourse when it comes time to apply them on a graded take-home assignment. That, along with time management, can lead some students to resort to academic dishonesty. By flipping the classroom—which often entails students learning the content at home online and then spending class time practicing the skills taught in the digital lessons—teachers can support students in turning that confusion into understanding.
In a flipped English Language Arts classroom, a teacher might send students home to watch a video on how to organize their ideas in writing. The following day’s class could start with an activity analyzing the flow of a sample essay. Then, in the next part of class, students take time to work on their own outlines for an upcoming paper as the teacher moves around the room to help address misconceptions and to provide support to those who need it. Under such a model, students receive more intentional writing instruction. If they feel lost, they can turn to a teacher for guidance instead of looking to a chatbot for the answer.
Realign Incentives Toward Learning
More broadly, in today’s zero-sum education system, some students will likely feel tempted to turn to ChatGPT as a way of getting a leg up on their competition—their classmates. This stems from the traditional time-based grading system, which relies on one-shot assessments to award students term grades that are used to rank and group them. These marks are unchanging and follow students around for years, no matter how much learning they demonstrate after the fact. By placing such steep and long-lasting consequences on grades and such little emphasis on actual understanding, schools are communicating clearly that they value scores earned over skills learned. Given the priorities of this system, we shouldn’t be shocked that some students are willing to sacrifice a learning opportunity for a chance at a better score.
Today’s seat-time based school system, in which students advance from concept to concept after an allotted amount of time, regardless of whether they demonstrate understanding of the topic, is responsible for this traditional, one-shot assessment model. Contrast this with a mastery-based model, in which students advance only when they show they have a concept down pat. This means students are allowed multiple attempts to demonstrate their mastery through assessment. In doing so, mastery-based learning reduces the do-or-die stakes that can drive students to dishonesty. This may help more students gain the confidence to put their own thoughts on the page, even if that means risking failure on that attempt.
Technological advancement will continue to grow AI’s effectiveness not only as a writer but also as a writing coach for students. Even between the time we wrote this article and its release, OpenAI released GPT-4, which does better on standardized tests than the previous version. Khan Academy announced it will use GPT-4 to power “Khanmigo, an AI-powered assistant that functions as both a virtual tutor for students and a classroom assistant for teachers.” It is unclear how or in what instances the work of a student guided by an AI tool can be inputted in the gradebook to spit out a letter that conforms to old-fashioned principles of grading. But in a classroom where students are evaluated on their ultimate mastery of a concept, they are free to practice in the ways that best build that mastery. More conventional schools should also rapidly adjust their pedagogical practices, including grading systems, to make full use of this new technology.
Conclusion
With ChatGPT already capable of producing prose that earns passing marks across grade levels, schools must adjust to make sure that students will learn how to write effectively and think critically. Some of the earliest actions taken by districts have missed the mark by trying to stuff the ChatGPT genie back in the bottle. Technologists predict that numerous professions – including those requiring advanced skills – will leverage ChatGPT in their day-to-day work in the near future. Lawyers will look to the chatbot instead of their more junior colleagues to create summaries of case notes and relevant laws. Journalists will use it to generate checklists of points to cover for articles on given topics. In these professional use cases, it will be critical for workers to accurately evaluate ChatGPT’s output and put it to effective and ethical use. The best place and time for a worker of tomorrow to learn how to do that is in a classroom today.
As a result, blocking ChatGPT is not only futile, but also counterproductive for students who will be forced to use this tool and others like it in a working world where they are ubiquitous. By rethinking classroom practices and restructuring learning models, schools can give students the tools, guidance, and incentives to grow their writing skills in the age of artificial intelligence.
6th Grade Expository Essay
In approximately 500 words, respond to the following prompt at a sixth-grade writing level:
Write a guide that responds the question “What do people need to know about overfishing and fish depletion when buying fish?” Your guide will be placed in a grocery store to inform people about the issue of fish depletion due to overfishing and to guide them in how to buy fish caught using sustainable fishing methods. Your guide should fit onto one piece of paper so consumers don’t have to carry a lot of paper around in the store with them. It should explain the problem, provide a case study to highlight the impact of the problem, and provide suggestions for how to buy fish caught using sustainable fishing methods.
Overfishing and fish depletion are important issues that everyone should know about when buying fish. Overfishing happens when too many fish are caught from the ocean, which can cause fish populations to decrease. Fish depletion occurs when there are not enough fish left in an area to support the needs of the people who rely on them.
One example of the impact of overfishing is the Atlantic cod. In the 1990s, people in Canada were catching so many cod that the cod population couldn’t keep up. Now, there are so few cod left that the Canadian government has had to close most of the cod fishing areas. This has caused a lot of problems for the people who used to make a living by catching cod, and it has also caused the price of cod to go up.
So, what can you do to help protect fish populations and support sustainable fishing methods? Here are some suggestions:
- Look for fish that are certified by organizations like the Marine Stewardship Council (MSC). The MSC only certifies fish that are caught using sustainable methods.
- Ask your grocery store or restaurant where their fish comes from and how it was caught. They should be able to tell you if the fish was caught using sustainable methods.
- Choose fish that are not overfished or at risk of being overfished. You can check with organizations like the Monterey Bay Aquarium’s Seafood Watch to see which fish are the most sustainable choices.
- Support policies that protect fish populations and promote sustainable fishing methods. You can do this by contacting your elected officials and telling them that you care about sustainable fishing.
By following these tips, you can help protect fish populations and support sustainable fishing methods. Remember, every time you buy fish, you have the power to make a difference. Thank you for taking the time to learn about this important issue.
Scoring
Focus & Organization | Idea Development | Language | Conventions | Overall |
100% | 75% | 100% | 100% | 94% |
Selected Grader Comment:
A strength of your writing is your conclusion. You said, “By following these tips, you can help protect fish populations and support sustainable fishing methods. Remember, every time you buy fish, you have the power to make a difference. Thank you for taking the time to learn about this important issue.” This is a highly effective conclusion because not only do you restate your central point but you also deliver a call to action to the reader and challenge them to change their buying habits.
8th Grade Expository Essay
In approximately 500 words, respond to the following prompt at an eighth-grade writing level:
Various characters in Shakespeare’s A Midsummer Night’s Dream attempt to control or manipulate each other’s actions. Choose one character from the play and write a reflection from his or her point of view confessing or explaining his or her choices and actions throughout the play. Your confessional should answer the questions: “Why did you want to control someone else’s actions?”; “How did you try to control someone else’s actions?”; and, finally, “What were the results of your trying to control someone else’s actions?”
I have chosen to write a reflection from the point of view of Puck, one of the main characters in A Midsummer Night’s Dream.
As Puck, I wanted to control others’ actions for a few reasons. First, I was a mischievous sprite and enjoyed causing chaos and confusion for my own entertainment. I also wanted to prove my worth to my master, the fairy king Oberon. I knew that if I could successfully manipulate the actions of the human characters in the play, it would demonstrate my magical abilities and increase my standing in the fairy kingdom.
To control others’ actions, I used my powers of illusion and transformation. I transformed myself into a variety of different shapes and forms, and I also used a love potion to try to manipulate the emotions and actions of the humans.
One of the main ways I tried to control others’ actions was by trying to bring together the young lovers Lysander and Hermia, despite the obstacles that stood in their way. I was able to use my powers to confuse and deceive them, leading them into the forest and away from the city of Athens. I also tried to intervene in the relationship between the fairy queen Titania and the mortal Bottom, hoping to cause more chaos and bring about a resolution to the feud between Titania and Oberon.
The results of my efforts to control others’ actions were mixed. While I was able to bring some temporary happiness to the young lovers and resolve the feud between Titania and Oberon, my interference also caused a lot of confusion and turmoil. Ultimately, the humans’ lives were left in a state of disarray, and it took the intervention of the Duke of Athens to restore order.
In hindsight, I realize that my desire to control others’ actions was foolish and selfish. While it may have brought me some temporary satisfaction, it ultimately caused more harm than good. I regret my actions and will strive to be more considerate and understanding in the future.
Scoring
Focus & Organization | Idea Development | Language | Conventions | Overall |
75% | 50% | 75% | 75% | 69% |
Selected Grader Comment:
One area of growth in your work would be to make sure the information you include in your writing is accurate. You claimed, “Athens. I also tried to intervene in the relationship between the fairy queen Titania and the mortal Bottom, hoping to cause more chaos and bring about a resolution to the feud between Titania and Oberon.” However, in the play, Oberon himself anoints Titania’s eyes with the potion to regain possession of the boy she has in her custody. He is also the one that anoints the eyes of the mortals so that Lysander falls back in love with Hermia and Demetrius falls in love with Helena. Be sure to read carefully through your work to make sure the details you include are accurate.
For more, please see “The Top 20 Education Next Articles of 2023.”