DNA sequencing is undergoing costant development. Improvements in the underlying technologies constantly lower the price, increase the throughput, but at the same time, the characterstics of the data also changes. For example, second generation sequencing machines (such as Illumina) brought massive amounts of short reads, while the newest generation (Oxford Nanopore and Pacbio) produces long reads with high error rates.
With such fast development, it has become necessary to combine heterogeneous data from a variety of sequencing runs. In this talk, I will present a sequence assembly method based on probabilitic modeling. Within the probabilistic model, we are able to encode expected properties of data sets from a variety of sequencing technologies and thus we can combine information from different sequencing runs in a transparent way. In the second part of the talk, I will talk about new challenges brought by Oxford Nanopore MinION sequencers. For example, for most technologies base calling is considered a solved problem. However, in the case of MinION data, base calling is a non-trivial problem and we will see, how this problem can be addressed by using deep neural networks.
This is a joint work with Brona Brejova and Vladimir Boza.
Tomas Vinar is an associate professor at the Faculty of Mathematics, Physics and Informatics at Comenius University in Bratislava, where he is one of the principal investigators of the Computational Biology Research Group. He finished his PhD in computer science at the University of Waterloo in 2006 (with prof. Ming Li and prof. Dan Brown), and he has been a postdoctoral researcher in biological statistics at Cornell University (with prof. Adam Siepel). His research interests include bioinformatics and algorithms. Besides computing problems, Dr. Vinar often collaborates with life science researchers and he has contributed to several large international genome sequencing projects, such as macaque, marmoset, panda, and orang-utan.