We describe recent breakthroughs in the field of compressed data structures, in which the data structure is stored in a compressed representation that still allows fast answers to queries. We focus in particular on compressed data structures to support the important application of pattern matching on massive document collections. Given an arbitrary query pattern in textual form, the job of the data structure is to report all the locations where the pattern appears. Another variant is to report all the documents that contain at least one instance of the pattern. We are particularly interested in reporting only the most relevant documents, using a variety of notions of relevance, as well as in performance in the external memory model, where number of I/Os is a primary measure. We discuss recently developed techniques that support fast search in these contexts as well as under additional positional and temporal constraints.
Dr. Jeffrey Vitter is the provost and executive vice chancellor and the Roy A. Roberts Distinguished Professor at the University of Kansas. As provost, Dr. Vitter is the chief academic and operations officer for the Lawrence and Edwards campuses, and he oversees strategic planning and implementation. Before coming to KU, Dr. Vitter held a similar post at Texas A&M University. He served as the Frederick L. Hovde Dean of the College of Science and as Professor of Computer Science at Purdue University. He held a distinguished professorship at Duke University, and served at Duke as chair of the Department of Computer Science. Before that he progressed through the faculty ranks and in leadership roles at Brown University.
Dr. Vitter’s educational degrees include a B.S. with highest honors in mathematics in 1977 from the University of Notre Dame; a Ph.D. in computer science in 1980 from Stanford University; and an M.B.A. in 2002 from Duke University.
Dr. Vitter’s research deals with the algorithmic aspects of processing, compressing, and communicating massive amounts of information. He has done much work in external memory algorithms, compressed data structures, data compression, machine learning, and databases. He has been elected a Fellow of the Guggenheim Foundation, the American Association for the Advancement of Science, the Association for Computing Machinery, and the Institute of Electrical and Electronics Engineers. He was named a National Science Foundation Presidential Young Investigator and is a Fulbright Scholar. He has over 280 book, journal, conference, and patent publications. He is an ISI highly cited researcher with a Google Scholar h-index of 60.