Login [Center] Logout Join Us Guidelines  I  中文  I  CQI

Hash functions bridging the gap from theory to practice

Speaker: Mikkel Thorup Basic Algorithms Research Copenhagen
Time: 2024-12-03 14:00-2024-12-03 15:00
Venue: Lecture Hall, FIT Building

Abstract:

Randomized algorithms are often enjoyed for their simplicity, but the hash functions employed to yield the desired probabilistic guarantees are often too complicated to be practical. Hash functions are used everywhere in computing, e.g., hash tables, sketching, dimensionality reduction, sampling, and estimation. Many of these applications are relevant to Machine Learning, where we are often interested in similarity between high dimensional objects. Reducing the dimensionality is key to efficient processing. Abstractly, we like to think of hashing as fully-random hashing, assigning independent hash values to every possible key, but essentially this requires us to store the hash values for all keys, which is unrealistic for most key universes, e.g., 64-bit keys. In practice we have to settle for implementable hash functions, and often practitioners settle for implementations that are too simple in that the algorithms ends up working only for sufficiently random input. However, the real world is full of structured/non-random input. The issue is severe, for simplistic hash functions will often work very well in tests with random input. Moreover, the issue is often that error events that should never happen in practice, happen with way too high probability. This does not show in a few test, but will show up over time when you put the system into production. Over the last decade there has been major developments in simple to implement tabulation based hash functions offering strong theoretical guarantees, so as to support fundamental properties such as Chernoff bounds, Sparse Johnson-Lindenstrauss transforms, and fully-random hashing on a given set w.h.p. etc. I will discuss some of the principles of these developments and offer insights on how far we can bridge from theory (assuming fully-random hash functions) to practice (needing something that can actually implemented efficiently).

Short Bio:

Mikkel Thorup (born 1965) has a D.Phil. from Oxford University from 1993. From 1993 to 1998 he was at the University of Copenhagen. From 1998 to 2013 he was at AT&T Labs-Research. Since 2013 he has been back as Professor at the University of Copenhagen. Since 2017, he has been a VILLUM Investigator heading Center for Basic Algorithms Research Copenhagen (BARC). Mikkel is a Fellow of the ACM and of AT&T, and a Member of the Royal Danish Academy of Sciences and Letters. He is co-winner of the 2011 MAA Robbins Award in mathematics and winner of the 2015 Villum Kann Rasmussen Award for Technical and Scientific Research, which is Denmark's biggest individual prize for research. More recently he was co-winner of the 2021 AMS-MOS Fulkerson Prize and an ACM STOC 20-year test of time award. Mikkel's main work is in algorithms and data structures, where he has worked on both upper and lower bounds. Recently one of his main focuses has been on hash functions unifying theory and practice. Mikkel prefers to seek his mathematical inspiration in nature, combining the quest with his hobbies of bird watching and mushroom picking.