New tool could help authors bust writer's block in novel-length works

Jessica Hallman
August 24, 2021

UNIVERSITY PARK, Pa. — Authors experiencing writer’s block could soon have a new way to help develop the next section of their story.

Researchers at the Penn State College of Information Sciences and Technology recently introduced a new technology that forecasts the future development of an ongoing written story. In their approach, researchers first characterize the narrative world using over 1,000 different “semantic frames,” where each frame represents a cluster of concepts and related knowledge. A predictive algorithm then looks at the preceding story and predicts the semantic frames that might occur in the next 10, 100, or even 1,000 sentences in an ongoing story.

Unlike current automated text generated methods, the researchers’ approach could help authors to craft language for the follow-up story arc beyond the scope of a few sentences, a limitation of existing models.

“These creative writing tasks seem nearly impossible to fully automate,” said Kenneth Huang, assistant professor of information sciences and technology. “The reason that we are tackling these very creative tasks is to push the boundaries of AI and natural language processing. Developing solutions for challenging creative tasks will teach us about the capacity and limitations of the current computational techniques, and so that we can further improve computer science.”

While existing models can generate a full story, they are tested and proven to be successful on short works of 15 sentences or less. Huang and his team wanted to develop a tool that could help authors who write novels, which are typically 50,000 words or more.

“When providing longer text prediction, we essentially provide follow-up ideas to help novelists to plan their story and set up goals instead of generating detailed stories for them,” said Chieh-Yang Huang, doctoral student of informatics. “We envision that in the future we can provide various ideas to stimulate novelists to brainstorm different story arcs.”

The researchers’ framework, called semantic frame forecast, breaks a long narrative down into a sequence of text blocks with each containing a fixed number of sentences. The frequency of the occurrence of each semantic frame is then calculated. Then, the text is converted to a vector — numerical data understood by a machine — where each dimension denotes the frequency of one frame. It is then computed to quantify the number of times a semantic frame appears and signifies its importance. Finally, the model inputs a fixed number of text blocks and predicts the semantic frame for the forthcoming block.

To make the output understandable to human users, the researchers converted the resulting vector back from a set of numbers to a word cloud. Online crowd workers tested and confirmed the representativeness and specificity of the produced word clouds.

Authors could use the tool by feeding a part of their already-written text into the system to generate a set of word clouds with suggested nouns, verbs and adjectives to inspire them when crafting the next part of their story.

The researchers tested their model on a dataset of nearly 5,000 fictional books and measured the tool’s effect of frame representation for different context lengths, varying the story block lengths between five and 1,000 sentences. Additionally, they tested semantic frame forecast on nearly 8,000 scholarly articles using human-annotated abstracts from the CODA-19 dataset, highlighting the tool’s potential impact in nonfiction applications.

“It shows the generalizability of the technology. Our approach works not only in stories, but also in scientific articles,” said Kenneth. “If we can do it on both scientific papers and novels, we could probably do it on news and on other genres.”

Added Chieh-Yang, “Our experiment shows that forecasting forthcoming semantic frames is challenging but possible.

The researchers plan to incorporate semantic frame forecast into a crowd-powered system that they previously developed, which enables writers to elicit story ideas from the online crowd, to further study how the tool can be used to support authors.

“If an automated system can augment human creativity, it will be impactful,” said Kenneth. “Even if the author doesn’t directly use what is generated, the machine’s outputs could inspire something that the writer didn’t think of before.”

The work was presented at the 2021 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), held virtually in early June. The CODA-19 dataset portion of the project was funded by the Penn State Huck Institutes of the Life Sciences’ Coronavirus Research Seed Fund and the College of IST COVID-19 Seed Fund.

Last Updated August 24, 2021