LMU: “KI Lectures”: Artificial intelligence offers great potential for the study of ancient literature
In the LMU’s “KI Lectures” series, Professor Enrique Jiménez explains how artificial intelligence can enable people to read some of the oldest known scripts.
One of the world’s oldest forms of writing, cuneiform, originated in ancient Mesopotamia in the late fourth millennium BC. “This writing was engraved on clay,” explains Assyriologist Professor Enrique Jiménez in LMU’s fourth talk in the virtual “KI Lectures” series. “Clay is the cheapest and most durable material for writing on. However, once it’s dried it is very fragile, so only fragments of it have survived.”
An algorithm fed with various data models can help scientists take ancient text fragments such as those from the famous Epic of Gilgamesh and piece them together into a complete text. About 60 percent of the epic has been recovered so far; every year scientists discover new fragments that need to be correctly assigned. It is hoped that the algorithm will speed up the process dramatically.
Same character, several meanings
Artificial intelligence can be of great help in the difficult task of reconstructing such texts. This is because one of the main problems in deciphering Babylonian literature is the polyvalence of the cuneiform characters: “The same character can have several meanings,” explains Professor Jiménez from LMU. “So, without context, a cuneiform character is not translatable.”
Help comes in the form of computer applications fed with various data models. This allows the most likely reading of the character to be predicted – with an accuracy of up to 98 percent, unlike the traditional text reconstruction method. “The traditional method relies on the researchers’ good memory and, of course, random chance,” explains Jiménez. “The modern AI-based reconstruction method, on the other hand, relies on research databases that aggregate thousands of characters. With the help of artificial intelligence, all known variants of a Babylonian text can be analyzed quickly and assigned correctly.”
Sequence analysis is one of the techniques applied here. It is a method from molecular biology and bioinformatics that enables the computer-aided determination of characteristic sections of a DNA sequence. Professor Jiménez and his team adapted this algorithm to the cuneiform script in order to help them fill existing gaps and be able to assign the correct text to each fragment.
Algorithms need large quantities of data
But for automated character recognition to work well, the system needs to be fed with lots of data. “Deep learning requires very large quantities of data, which unfortunately we don’t have in this field at the moment,” says Jiménez. “As linguistic researchers we would need to photograph many more text fragments and store them in databases. That is why we have come to an arrangement with the world’s biggest cuneiform collections, namely the British Museum and the Iraq Museum, to include large parts of their collections.”
With the help of partner institutions, Enrique Jiménez and his team have nevertheless created a huge database of fragments that is available to international scholars. The LMU’s Professor of Ancient Near Eastern Literatures considers this a very clear sign that computer science will play a bigger role in the training of Assyriologists in the future, since many tasks can be solved more easily if the processes are fully or partially automated.