ITMO: Neural Networks Predict Future Bestsellers

By iednewsdesk On Feb 4, 2021

It’s well known that even some of the most famous books of all time had difficulty getting published. J.K. Rowling’s original Harry Potter pitch was rejected 12 times, William Golding’s debut novel Lord of the Flies received 21 rejections, and C. S. Lewis’s first book in the Chronicles of Narnia went through 37 different publishers before it finally hit the shelves. Could all this have been avoided? Are there any objective factors, which determine a book’s potential popularity? Is it possible to create a success-predicting algorithm?

These are the questions posed by scientists from ITMO University and the University of Oulu working with neural networks and machine learning technologies. They have created an algorithm that can analyze the emotional fluctuations of a text and thus predict whether a book will be a bestseller or not.

The scale of emotions

Words evoke emotions in readers, with each having their own experiences. There are, however, universal words, which make people respond if not in the same then a similar way. When analyzing a text, these words can be distinguished as markers that create a specific emotional response to the entire fragment.

The researchers used such markers for eight basic emotions identified by other scientists in the second version of the NRC Emotion Intensity Lexicon. These are anger, anticipation, disgust, fear, joy, sadness, surprise, and trust. They then trained an algorithm to find words associated with different intensities (or degrees) of an emotion. Thus, without understanding the substance, the algorithm could determine its emotional fluctuations.

“We take a piece of text and a database with emotion markers, and analyze them,” says ‪Ivan Smetannikov‬, a co-author of the article and an associate professor at ITMO’s . “Thus, each piece of text receives eight values, each of which corresponds to the intensity of this or that emotion.”

After reviewing the entire book, the algorithm can develop a plot, which the creators called the book’s emotional footprint. Then, the program compares this graph with other successful books of this genre and concludes whether the book will be popular amongst fans of this particular genre or not.

The scientists have analyzed nearly 171,000 books from various databases. When analyzing their forecasts and ratings of books, the algorithm turned out to be 73% accurate. The researchers also tried to do a reverse analysis, namely, to make the algorithm define a genre based on its emotional footprint.

“Around 41% of books have genre characteristics, while others are not so obvious. If you look at footprints of different genres, you will see that, for example, trust as an emotion prevails in popular horror novels, while in children’s books the trust level drops by its finale and anticipation takes the lead. The first ten percent of detective stories are full of anticipation, and then it disappears to only return at the end. So, we can clearly see some genre clichés,” explains Ivan Smetannikov.

Even though the algorithm accurately predicted in three out of four cases, the scientists still emphasize that it can’t guarantee the success or failure of a given text. Moreover, it may not even be about the book falling into that 27% of incorrect results.

“How we consume content is changing. We’re training the program using a large database of published books, but it doesn’t necessarily mean that new books with similar patterns will be popular. Time goes by, and the methods that worked ten years ago may not work now,” stresses Ivan Smetannikov.

This concept could potentially be adopted by publishers who want to test their impressions of books with the help of technologies. Similar principles can be also used in the film industry. However, It will require a more complex set of analyzed data and other algorithms to evaluate not only the movie’s plot but also its audiovisual elements.

The research was presented at the 2020 International Conference on Control, Robotics and Intelligent System (CCRIS 2020).

ITMO