sign up log in
Want to go ad-free? Find out how, here.

Four professional editors asked AI to do their job – and it ruined their short story

Technology / analysis
Four professional editors asked AI to do their job – and it ruined their short story
Can AI edit stories?
Can AI edit stories?

Writers have been using AI tools for years – from Microsoft Word’s spellcheck (which often makes unwanted corrections) to the passive-aggressive Grammarly. But ChatGPT is different.

ChatGPT’s natural language processing enables a dialogue, much like a conversation – albeit with a slightly odd acquaintance. And it can generate vast amounts of copy, quickly, in response to queries posed in ordinary, everyday language. This suggests, at least superficially, it can do some of the work a book editor does.

We are professional editors, with extensive experience in the Australian book publishing industry, who wanted to know how ChatGPT would perform when compared to a human editor. To find out, we decided to ask it to edit a short story that had already been worked on by human editors – and we compared the results.

The experiment: ChatGPT vs human editors

The story we chose, The Ninch (written by Rose), had gone through three separate rounds of editing, with four human editors (and a typesetter).

The first version had been rejected by literary journal Overland, but its fiction editor Claire Corbett had given generous feedback. The next version received detailed advice from freelance editor Nicola Redhouse, a judge of the Big Issue fiction edition (which had shortlisted the story). Finally, the piece found a home at another literary journal, Meanjin, where deputy editor Tess Smurthwaite incorporated comments from the issue’s freelance editor and also their typesetter in her correspondence.

We had a wealth of human feedback to compare ChatGPT’s recommendations with.

We used a standard, free ChatGPT generative AI tool for our edits, which we conducted as separate series of prompts designed to assess the scope and success of AI as an editorial tool.

We wanted to see if ChatGPT could develop and fine tune this unpublished work – and if so, whether it would do it in a way that resembled current editorial practice. By comparing it with human examples, we tried to determine where and at what stage in the process ChatGPT might be most successful as an editorial tool.

The story includes expressive descriptions, poetic imagery, strong symbolism and a subtle subtext. It explores themes of motherhood, nature, and hints at deeper mysteries.

We chose it because we believe the literary genre, with its play and experimentation, poetry and lyricism, offers rich pickings for complex editorial conversations. (And because we knew we could get permission from all participants in the process to share their feedback.)

In the story, a mother reflects on her untamed, sea-loving child. Supernatural possibilities are hinted at before the tale turns closer to home, ending with the mother revealing her own divergent nature – and looping back to offer more meaning to the title:

pinching the skin between my toes … Making each digit its own unique peninsula.

The story used for the experiment, about a mother and her untamed, sea-loving child, hinted at the supernatural. Mae I. Balland/Pexels.

Round 1: the first draft

We started with a simple, general prompt, assuming the least amount of editorial guidance from the author. (Authors submitting stories to magazines and journals generally don’t give human editors a detailed, prescriptive brief.)

Our initial prompt for all three examples was: “Hi ChatGPT, could I please ask for your editorial suggestions on my short story, which I’d like to submit for publication in a literary journal?”

Responding to the first version of the story, ChatGPT provided a summary of key themes (motherhood, connection to nature, the mysteries of the ocean) and made a list of editorial suggestions.

Interestingly, ChatGPT did not pick up that the story was now published and attributed to an author. Raising questions about its ability, or inclination, to identify plagiarism. Nor did it define the genre, which is one of the first assessments an editor makes.

ChatGPT’s suggestions were: to add more description of the coastal setting, provide more physical description of the characters, break up long paragraphs to make the piece more reader-friendly, add more dialogue for characterisation and insight, make the sentences shorter, reveal more inner thoughts of the characters, expand on the symbolism, show don’t tell, incorporate foreshadowing earlier, and provide resolution rather than ending on a mystery.

All good, if stock standard, advice.

ChatGPT also suggested reconsidering the title – clearly not making the connection between mother and daughter’s ocean affinity and their webbed toes – and reading the story aloud to help identify awkward phrasing, pacing and structure.

While this wasn’t particularly helpful feedback, it was not technically wrong.

ChatGPT picked up on the major themes and main characters. And the advice for more foreshadowing, dialogue and description, along with shorter paragraphs and an alternative ending, was generally sound.

In fact, it echoed the usual feedback you’d get from a creative writing workshop, or the kind of advice offered in books on the writing craft.

They are the sort of suggestions an editor might write in response to almost any text – not particularly specific to this story, or to our stated aim of submitting it to a literary publication.

ChatGPT’s editing advice was not specific to the story. Shutterstock.

Stage two: AI (re)writes

Next, we provided a second prompt, responding to ChatGPT’s initial feedback – attempting to emulate the back-and-forth discussions that are a key part of the editorial process.

We asked ChatGPT to take a more practical, interventionist approach and rework the text in line with its own editorial suggestions:

Thank you for your feedback about uneven pacing. Could you please suggest places in the story where the pace needs to speed up or slow down? Thank you too for the feedback about imagery and description. Could you please suggest places where there is too much imagery and it needs more action storytelling instead?

That’s where things fell apart.

ChatGPT offered a radically shorter, changed story. The atmospheric descriptions, evocative imagery and nods towards (unspoken) mystery were replaced with unsubtle phrases – which Rose swears she would never have written, or signed off on.

Lines added included: “my daughter has always been an enigma to me”, “little did I know” and “a sense of unease washed over me”. Later in the story, this phrasing was clumsily suggested a second time: “relief washed over me”.

The author’s unique descriptions were changed to familiar cliches: “rugged beauty”, “roar of the ocean”, “unbreakable bond”. ChatGPT also changed the text from Australian English (which all Australian publications require) to US spelling and style (“realization”, “mom”).

In summary, a story where a mother sees her daughter as a “southern selkie going home” (phrasing that hints at a speculative subtext) on a rocky outcrop and really sees her (in all possible, playful senses of that word) was changed to a fishing tale, where a (definitely human) girl arrives home holding up, we kid you not, “a shiny fish”.

It became hard to give credence to any of ChatGPT’s advice.

Esteemed editor Bruce Sims once advised it’s not an editor’s job to fix things; it’s an editor’s job to point out what needs fixing. But if you are asked to be a hands-on editor, your revisions must be an improvement on the original – not just different. And certainly not worse.

It is our industry’s maxim, too, to first do no harm. Not only did ChatGPT not improve Rose’s story, it made it worse.

What did the human editors do?

ChatGPT’s edit did not come close to the calibre of insight and editorial know-how offered by Overland editor Claire Corbett. Some examples:

There’s some beautiful writing and fantastic themes, but the quotes about drowning are heavy-handed; they’re given the job of foreshadowing suspense, creating unease in the reader, rather than the narrator doing that job.

The biggest problem is that final transition – I don’t know how to read the narrator. Her emotions don’t seem to fit the situation.

For me stories are driven by choices and I’m not clear what decision our narrator, or anyone else, in the story faces.

It’s entirely possible I’m not getting something important, but I think that if I’m not getting it, our readers won’t either.

Freelance editor Nicola, who has a personal relationship with Rose, went even further in her exchange (in response to the next draft, where Rose had attempted to address the issues Claire identified). She pushed Rose to work and rework the last sentence until they both felt the language lock in and land.

I’m not 100% sold on this line. I think it’s a little confusing … It might just be too much hinted at in too subtle a way for the reader.

Originally, the final sentence read: “Ready to make my slower way back to the house, retracing – overwriting – any sign of my own less-than more-than normal prints.”

The final version is: “Ready to make my slower way back to the house, retracing, overwriting, any sign of my own less-than, more-than, normal prints.” With the addition of a final standalone line: “I have seen what I wanted to see: her, me, free.”

Claire and Nicola’s feedback show how an editor is a story’s ideal reader. A good editor can guide the author through problems with point of view and emotional dynamics – going beyond the simple mechanics of grammar, sentence length and the number of adjectives.

In other words, they demonstrate something we call editorial intelligence.

Editorial intelligence is akin to emotional intelligence. It incorporates intellectual, creative and emotional capital – all gained from lived experience, complemented by technical skills and industry expertise, applied through the prism of human understanding.

Skills include confident conviction, based on deep accumulated knowledge, meticulous research, cultural mediation and social skills. (After all, the author doesn’t have to do what we say – ours is a persuasive profession.)

Round 2: the revised story

Next, we submitted a revised draft that had addressed Claire’s suggestions and incorporated the conversations with Nicola.

This draft was submitted with the same initial prompt: “Hi ChatGPT, could I please ask for your editorial suggestions on my short story, which I’d like to submit for publication in a literary journal?”

ChatGPT responded with a summary of themes and editorial suggestions very similar to what it had offered in the first round. Again, it didn’t pick up that the story had already been published, nor did it clearly identify the genre.

For the follow-up, we asked specifically for an edit that corrected any issues with tense, spelling and punctuation.

It was a laborious process: the 2,500-word piece had to be submitted in chunks of 300–500 words and the revised sections manually combined.

However, these simpler editorial tasks were clearly more in ChatGPT’s ballpark. When we created a document (in Microsoft Word) that compared the original and AI-edited versions, the flagged changes appeared very much like a human editor’s tracked changes.

But ChatGPT’s changes revealed its own writing preferences, which didn’t allow for artistic play and experimentation. For example, it reinstated prepositions like “in”, “at”, “of” and “to”, which slowed down the reading and reduced the creativity of the piece – and altered the writing style.

This makes sense when you know the datasets that drive ChatGPT mean it explicitly works toward the word most likely to come next. (This might be directed differently in the future, towards more creative, and less stable or predictable models.)

Round 3: our final submission

In the third and final round of the experiment, we submitted the draft that had been accepted by Meanjin.

The process kicked off with the same initial prompt: “Hi ChatGPT, could I please ask for your editorial suggestions on my short story, which I’d like to submit for publication in a literary journal?”

Again, ChatGPT offered its rote list of editorial suggestions. (Was this even editing?)

This time, we followed up with separate prompts for each element we wanted ChatGPT to review: title, pacing, imagery/description.

ChatGPT came back with suggestions for how to revise specific parts of the text, but the suggestions were once again formulaic. There was no attempt to offer – or support – any decision to go against familiar tropes.

Many of ChatGPT’s suggestions – much like the machine rewrites earlier – were heavy-handed. The alternative titles, like “Seaside Solitude” and “Coastal Connection”, used cringeworthy alliteration.

In contrast, Meanjin’s editor Tess Smurthwaite – on behalf of herself, copyeditor Richard McGregor, and typesetter Patrick Cannon – offered light revisions:

The edits are relatively minimal, but please feel free to reject anything that you’re not comfortable with.

Our typesetter has queried one thing: on page 100, where “Not like a thing at all” has become a new para. He wants to know whether the quote marks should change. Technically, I’m thinking that we should add a closing one after “not a thing” and then an opening one on the next line, but I’m also worried it might read like the new para is a response, and that it hasn’t been said by Elsie. Let me know what you think.

Many of ChatGPT’s suggestions were heavy-handed. Tara Winstead/Pexels

Sometimes editorial expertise shows itself in not changing a text. Different isn’t necessarily good. It takes an expert to recognise when a story is working just fine. If it ain’t broke, don’t fix it.

It also takes a certain kind of aerial, bird’s-eye view to notice when the way type is set creates ambiguities in the text. Typesetters really are akin to editors.

The verdict: can ChatGPT edit?

So, ChatGPT can give credible-sounding editorial feedback. But we recommend editors and authors don’t ask it to give individual assessments or expert interventions any time soon.

A major problem that emerged early in this experiment involved ethics: ChatGPT did not ask for or verify the authorship of our story. A journal or magazine would ask an author to confirm a text is their own original work at some stage in the process: either at submission or contract stage.

A freelance editor would likely use other questions to determine the same answer – and in the process of asking about the author’s plans for publication, they would also determine the author’s own stylistic preferences.

Human editors demonstrate their credentials through their work history, and keep their experience up-to-date with professional training and qualifications.

What might the ethics be, we wonder, of giving the same recommendations to every author asking for editing advice? You might be disgruntled to receive generic feedback if you expect or have paid for for individual engagement.

As we’ve seen, when writing challenges expected conventions, AI struggles to respond. Its primary function is to appropriate, amalgamate and regurgitate – which is not enough when it comes to editing literary fiction.

Literary writing aims to – and often does – convey so much more than what the words on screen explicitly say. Literary writers strive for evocative, original prose that draws upon subtext and calls up undercurrents, making the most of nuance and implication to create imagined realities and invent unreal worlds.

At this stage of ChatGPT’s development, literally following the advice of its editing tools to edit literary fiction is likely to make it worse, not better.

In Rose’s case, her oceanic allegory about difference, with a nod to the supernatural, was turned into a story about a fish.

ChatGPT is ‘like the new intern’

This experiment shows how AI and human editors could work together. AI suggestions can be scrutinised – and integrated or dismissed – by authors or editors during the creative process.

And while many of its suggestions were not that useful, AI efficiently identified issues with tense, spelling and punctuation (within an overly narrow interpretation of these rules).

Without human editorial intelligence, ChatGPT does more harm than help. But when used by human editors, it’s like any other tool – as good, or bad, as the tradesperson who wields it.The Conversation


*Katherine Day, Lecturer, Publishing, The University of Melbourne; Renée Otmar, Honorary Research Fellow, Faculty of Health, Deakin University; Rose Michael, Senior Lecturer, Program Manager BA (Creative Writing), RMIT University, and Sharon Mullins, Tutor, Publishing and Editing, The University of Melbourne.

This article is republished from The Conversation under a Creative Commons license. Read the original article.

We welcome your comments below. If you are not already registered, please register to comment.

Remember we welcome robust, respectful and insightful debate. We don't welcome abusive or defamatory comments and will de-register those repeatedly making such comments. Our current comment policy is here.

10 Comments

Very interesting, especially as I am writing a short story right now!!! But also interesting in terms of the wider limitations of AI in the creative field 

Up
2

The quality of the free version of ChatGPT has gone down hill significantly. 

If you are going to analyze an AI language model, at least use the latest version. And it tells you that it's database is limited to Jan 2022 so if the story was published after that it wouldn't know. 

Up
3

I tried it out , didnt think that much of it...asked it to come up with a few short poems... humans have the lead still...but that could change yet...early days... heres a sample of what its up against...ME....lol   ... I suspect AI does not have the depth yet...also theres a human element that is possibly unreachable ....by a machine. 

'Thoughts'
The thought, I thought has flown .... and left me here all alone,
Whenceforth did it go !..... I most earnestly will never know,
Alas,... it has not been long ,but theres another thought.... just GONE! ...lol               All rights reserved .... lol

Up
1

I asked gpt to summarise the article (to save you reading it) -

In the experiment, ChatGPT's initial suggestions were found to be generic and not particularly tailored to the story's context. As the experiment progressed, ChatGPT's edits became increasingly problematic, altering the essence of the story and failing to grasp its nuances. In contrast, human editors provided insightful feedback that delved into the story's themes, emotional dynamics, and narrative coherence. While ChatGPT demonstrated some competence in addressing simpler editorial tasks like grammar and punctuation, it lacked the editorial intelligence required to understand the subtleties of literary fiction. The experiment underscores the importance of human editorial expertise in discerning what truly enhances a story, highlighting that AI tools like ChatGPT can complement but not replace human editors in the creative process.

Up
4

Im not even sure chat can compliment edits presently...I agree human editors are streets ahead today ... Humans have a spirit and identity to protect .... lol  The problem for an AI mathematical genius to sort is how do you program the illogical into the logical and end up with a valid response ...much of our creativity lays in our ability to invent or blend and layer from the unlimited... Im not sure theres enough program code for that...and even if there was it would still be restrained by what is permitted and acceptable...in that sense AI has limitations....lol 

Up
1

Loving the comments so far.

Interestingly while ChatGPT is bad with creative it is more astoundingly bad at code which is quite formulaic. In most code performance tests by developers it goes from a 2%-22% success rate. 

However as we have seen with design, & code development using generators is not where the success of AI is but in the tools support e.g. Copilot and most other IDE tools support, Photoshop's AI tools etc. Where trained artists and developers are using AI tools to speed up normally quite arduous tasks that are repetitive. The editors and artists and software developers are still needed to direct and orchestrate, the knowledge & skills of those staff need to be much more refined, but different AI tools help speed things up immensely in day to day. Software developers have for a long time used AI tools in IDEs (code writing programs that can setup & run server testing too). They can speed up generation of methods & testing but you still need a human to direct it all and spot issues with service design etc. Artists have for years been using AI tools in programs like Photoshop, Maya etc.

So generative AI is really just better thought of as another tool looking for a good program home and perhaps better specialization for the tasks it is given. We don't need the create writing AI for generating code and it is best that the legal report writers can tie the AI into actual reference directories and not have it imagine falsified evidence.

Up
2

This is pretty much yesterdays news now that Sora is out

https://openai.com/sora

The ramifications on this are potentially huge...deep fakes, loss of jobs in video/animation etc efc

Up
1

Sora isn't publicly available yet, and the clips are limited to one minute. While the output at first glance seems impressive, Sora doesn't relate to the actual, physical world like human videographers and directors do. That's still in the works. 

Up
1

Apologies if that came across as a slight on your journalistic effort. It wasn't intended like that. The article was interesting. I was just trying to highlight the pace that things are moving at and that I think generative video has the potential to be even more disruptive than the text versions.

Maybe an article on the potential effects of generative video would be good even if Sora is still being red team tested and limited in time? (although adverts are usually less than a minute, so I'm guessing their will be some advertising agencies sweating already)

Up
0

Oh, "none taken" as a good friend likes to say and you're right, the pace of GenAI development is breath taking. I think for routine, stereotype stuff that was made by people, and which is low on originality, AI will likely be the cost effective option. From what I can tell, people are exploring that already with commercial applications being launched. It does open up some ethical questions though, and I wouldn't be surprised if regulators step in here.

Up
0