Speaker: Ke Yi Duke University (now at AT & T Research Labs)
Time: 2006-03-08 14:00-2006-03-08 15:00
Venue: FIT Building, Tsinghua University
Abstract:
As the massive data sets we face today far exceed memory limit, traditional RAM algorithms do not scale due to excessive page faults. In recent years, I/O-efficient algorithms have been a successive and active direction to better scalability on massive data. In this talk I will present an I/O-efficient algorithm for the batched (off-line) union-find problem, a fundamental algorithmic problem that has been studied for over four decades. Given any sequence of N mixed union and find operations.
our algorithm uses O(sort(N)) = O(N/B log_{M/B}(N/B)) I/Os, where M is the memory size and B is the disk block size. This bound is asymptotically optimal in the worst case. I will also present a simpler and more practical algorithm that uses O(sort(N) log N) I/Os.
The main motivation for our study of the union-find problem arises from problems in terrain analysis. A terrain can be abstracted as a height function defined over R^2, and many problems that deal with such functions require a union-find data structure. I will talk about two terrain analysis problems that benefit from a union-find data structure: (i) computing topological persistence and (ii) constructing the contour tree. We give the first O(sort(N))-I/O algorithms for these two problems, assuming that the input terrain is represented as a triangular mesh with N vertices.