Karlsruhe Institute of Technology Upgrades Lecture Translator with Advanced Text Processing Features

Automatic speech recognition and translation systems such as the Lecture Translator of the Karlsruhe Institute of Technology (KIT) can convert the spoken word in lectures into a text in several languages in real time. Such systems improve access to information for students with disabilities and foreign students. They also speed up work and learning in general through intelligent post-processing and archiving of spoken texts. KIT researchers have added new functions to the Lecture Translator. Automatic recognition of the spoken word in several languages at the same time, text segmentation and title generation in real time, summaries and links to technical terms as well as queries on what has been heard now make it easier to understand and efficiently process lectures.

“With Lecture Translator’s automatic simultaneous translation, we have brought spoken lectures closer to an international audience. However, this usually represents only 15 percent of the audience. With the new AI tools, we want to break down not only language barriers, but also comprehension barriers,” says Alexander Waibel, Professor of Computer Science at KIT. “Automatically transcribed texts of spoken language are often difficult to read, because they appear too quickly as a long text without paragraphs and subheadings – just as the lecture was delivered orally”. Processing the lecture is also time-consuming, because you have to search the lecture for gaps in understanding,” says Waibel.

Better overview of documents

The further development of the Lecture Translator provides a remedy. The researchers have developed several new automatic functions such as “Smart Chaptering”, “Summarization”, “Q&A” and “Auto-Links”. A new type of artificial intelligence (AI) that automatically recognizes language converts the spoken text into a transcript in multiple languages and automatically identifies paragraphs, chapter headings and key points. It also creates an acoustic rendition in which the user can select one of 18 languages. The program also automatically displays links as cross-references to relevant sources in lecture notes or Wikipedia, which students can use to better process the lecture. “With our new AI models, conversations and lectures can be better structured and even videos can be divided into easily navigable chapters,” says Waibel. This enables better understanding not only during the lecture, but also after the lecture.

Lecture Translator translates into 18 languages

The research team has integrated the work into the Lecture Translator, which is used at KIT to automatically transcribe lectures in real time. Chapter division, title generation, paragraph layout, summaries with links – which can also be used online and offline – now extend the Lecture Translator service and make it easier to work with the material. Translation is available in 18 languages. The technology has specific applications for content creators, students, teachers and podcasters who can structure their audio and video content for the first time. “Users can navigate through videos and lectures more efficiently, find relevant sections more quickly, and capture important core content more compactly and efficiently – they have a much better overall view and faster access to details,” says Waibel.

The research was carried out as part of the project “How is AI Changing Science?” and was funded for four years by the Volkswagen Foundation. In addition to KIT, the University of Bonn and the University of Vienna were also involved in the project.