AI dataset to enable the development of better music description tools

0

Song Describer is a collaborative platform for people to write descriptions of music under Creative Commons licenses, creating an open database of music with natural language description. As the database grows, it will support artificial intelligence research efforts in developing systems that combine natural language and audio processing to generate music captions automatically, among other applications.

Through Song Describer, researchers from Queen Mary’s C4DM (Centre for Digital Music) and the Music Technology Group at UPF (Pompeu Fabra University, Barcelona) are enabling the collection of textual descriptions of different songs, from genre, tone, emotions evoked by a melody, to instrumentation.

This public database of more than 10,000 pieces of music with their corresponding descriptions, can be used by the scientific community to develop, train and validate artificial intelligence models in the field of music description.

Song Describer is a crowdsourcing platform open to everyone, with no need for specialist musical knowledge. Researchers are calling on the public to support the project by writing descriptions of songs in English, with prizes available for the most active contributors. Around 100 people have written song descriptions so far.

There are three simple steps to get involved:

Create a profile including age, location and level of interest in music (excluding personal data). This information may help researchers to see if and how cultural factors affect the way that people describe songs.
Following the platform’s instructions, listen to songs and submit descriptions.
Evaluate descriptions made by other participants, indicating whether or not they seem valid and scoring them from 1 to 5. This is then used for quality control, so if many people invalidate a description or score it very low it gets discarded by the system.
Ilaria Manco, PhD Researcher in Artificial Intelligence and Music at Queen Mary University of London, said: “The field of music-and-language research is rapidly growing but finding suitable open datasets to support work in this field remains a challenge. This is why we decided to create Song Describer, an open-source crowdsourcing platform through which anyone can contribute to building a corpus of paired music tracks and natural language descriptions. We hope that data collected from our platform, will help to develop new audio-language models for music, as well as allow us to evaluate them in more detail.”

Dmitry Bogdanov, a researcher on the project from Pompeu Fabra University Barcelona, added: “We want to study the relationship between audio and these textual descriptions, and how people characterise music verbally, to develop machine learning models that generate music captions for any song from the audio.”

As for the uses of such systems, Bogdanov explained: “For many users, music captions can be useful for navigating music collections in an innovative and more intuitive way. On the one hand, people will be able to search for music through the automatically generated textual descriptions, and, on the other hand, they could make textual queries directly using natural language, for example, writing in a search engine ‘search for slow ballads with guitars and deep voices’.”