Hash functions are used everywhere in computing. Originally, hashing was invented to store and look up elements in computer memory, but more recently it has become a crucial component in the analysis of Big Data applications.
We think of a hash function as a random function, assigning random hash values to elements; thus if we use hash values to place elements in computer memory then we expect them to spread out nicely. At the same time, the hash function has to be fixed as soon as we start using it; otherwise can't find the elements again. In the context of Big Data, we use hash functions to create small sketches of large sets, allowing us to quickly measure how similar they are, and if they are similar, what exactly are the differences. The sets can be processed independently as long as we use the same hash function to create the sketch.
I will talk about some of these applications of random hash functions, about the probabilistic properties they are required to have, and about how they can be generated to work efficiently on huge data sets.
Mikkel Thorup (born 1965) has a D.Phil. from Oxford University from 1993. From 1993 to 1998 he was at the University of Copenhagen. From 1998 to 2013 he was at AT&T Labs-Research. Since 2013 he has been back as Professor at the University of Copenhagen. He is currently a VILLUM Investigator heading Center for Basic Algorithms Research Copenhagen (BARC) supported by a €5.3 million grant from the VILLUM Foundation.
Mikkel Thorup is a Fellow of the ACM and of AT&T, and a Member of the Royal Danish Academy of Sciences and Letters. He is co-winner of the 2011 MAA Robbins Award in mathematics and winner of the 2015 Villum Kann Rasmussen Award for Technical and Scientific Research, which is Denmark's biggest individual prize for research. His main work is in algorithms and data structures, where he has worked on both upper and lower bounds. Recently one of his main focusses has been on hash functions unifying theory and practice.