University of Helsinki: Awarded papers focus on solving basic algorithmic problems, with applications in bioinformatics

0

The prize for the best research article of the Workshop on Algorithms in Bioinformatics conference (WABI) was awarded to the research conducted in the Graph Algorithms team led by Associate Professor Alexandru Tomescu. The title of the article is “Eulertigs: minimum plain text representation of k-mer sets without repetitions in linear time”, and it was written by doctoral student Sebastian Schmidt and doctoral researcher Jarno Alanko.

High-throughput sequencing is a technology reading DNA or RNA sequences. For example, it was used to first assemble the SARS-CoV-2 genome in 2020, which was the basis of the mRNA vaccines in use today. This technology is mainstream in research, but in the near future it can also be crucial in, for example, personalized cancer therapies. The data produced by such technologies is huge, and there is a need both to quickly analyze this data, and to compress it for storage and later use.

The winning paper gives an efficient algorithm for finding a minimum representation of a set of given strings (k-mers), with various bioinformatics applications, such as compression, or speeding up different other bioinformatics data structures or data analyses. This problem was previously believed to be computationally hard. It was not previously known if this problem could be solved optimally and efficiently, which the paper of Schmidt and Alanko now shows.

Other works presented at ALGO
The group also presented more papers in other events included at ALGO. One was entitled “Width Helps and Hinders Splitting Flows”, co-authored by PhD Researchers Manuel Cáceres and Andreas Grigorjew, and Assoc. Professor Alexandru Tomescu, with researchers from University of Montana, University of Verona, and IIT Roorkee. It is about an approximation algorithm for the problem of decomposing a network flow into weighted paths, which is a basic computational problem in various fields of Computer Science, such as networking, transportation and bioinformatics.

The second paper, entitled “Optimizing the Safe Flow Decompositions in DAGs”, co-authored by Graph Algorithms team alumnus Shahbaz Khan (now an Assistant Professor at IIT Roorkee, India) and Assoc. Prof. Alexandru Tomescu is about the same flow decomposition problem, but focusing on its safety version. Namely, on efficient algorithms reporting all those paths that appear in any flow decomposition of a network flow, with applications in more accurately reconstructing biological sequences. This work is part of the ongoing ERC Starting Grant SAFEBIO, focusing on safe versions of basic algorithmic problems, with applications in bioinformatics.

The applications of this project are in the use of high-throughput sequencing data. Such technologies cannot read a full DNA sequence from beginning to end, but instead report millions of short DNA fragments. Assembling into the DNA sequences from which they originated is like solving a huge puzzle, with the help of algorithms. Even so, due to the vast amount of input data, there is no unique reconstruction of this puzzle, and millions or even billions of reconstructions are equally likely.

This ERC project proposed a novel methodology to deal with this uncertainty, by studying only those partial reconstructions that appear in all these billions of possible reconstructions. This project has lead to practical tools that can be used by biologists to make sense of this enormous uncertainty.

The Algorithmic Bioinformatics group at the Department of Computer Science comprises of five teams approaching Bioinformatics problems from different angles: data structures, compression, string or graph algorithms, efficient implementations, and problems motivated by practical and technological challenges. The main applications are in developing practical methods usable by biological or medical researchers to understand life, evolution, or study diseases. The Graph Algorithms team formed in 2019, co-funded by an ERC Starting Grant and an Academy of Finland Research fellowship awarded to PI Alexandru Tomescu.

ALGO is an annual meeting encompassing several algorithmic conferences, including the premier conference European Symposium on Algorithms (ESA), and one of the major conferences in algorithmic bioinformatics, Workshop on Algorithms in Bioinformatics (WABI).

This year the Algorithmic Bioinformatics Group was involved with two keynote speakers (University Lecturer Leena Salmela and Associate Professor Simon Puglisi), two ESA papers, two WABI papers (one of which got the WABI Best Paper award), and one ESA program committee member, Professor Veli Mäkinen.