Modern DNA sequencing technologies can produce large quantities of short strings called reads, each read covering a small fraction of the genome being sequenced. In this talk we will describe how to estimate the size of the genome from this data. Our method is based on counting occurrences of fixed-sized substrings in the set of reads, summarizing these counts in a histogram and modeling the histogram using probabilistic models of the sequencing process. This research was done with Michal Hozza, Werner Krampl, and Tomas Vinar.
Brona Brejova is an associate professor at the Dept. of Computer Science at Comenius University in Bratislava, Slovakia. She has obtained PhD. at the University of Waterloo in Canada followed by a postdoctoral stay at Cornell University in the U.S.A. Her main research interests are comparative genomics, genome annotation, and algorithms for analysis of sequencing data.