清华大学交叉信息研究院

Connecting the Dots: Making Sense of Big Data on the Web

演讲人： Gerard de Melo UC Berkeley
时间： 2013-03-14 10:00-2013-03-14 11:00
地点：FIT-1-222
内容：

The vast amounts of data available on the Web present unique opportunities, but are often extremely hard to work with due to their scale, noisiness, and heterogeneity.
In this talk, I discuss novel algorithms that address the challenge of making sense of both structured data and unstructured text on the Web. One major focus is reliably matching equivalent items across different Web sources, including Wikipedia and domain-specific databases, which we solved using scalable graph-based algorithms and linear optimization techniques. Building on this, I discuss methods to harvest taxonomic and semantic information about entities and concepts in over 100 languages, which led to UWN/MENTA, the largest database of its kind. Finally, I present Web-scale text analytics methods that allow us to collect additional common-sense knowledge that is useful in natural language understanding tasks. Before concluding, I outline several applications of this work, including query interfaces and reasoning engines.

个人简介:

Gerard de Melo is a Visiting Scholar at UC Berkeley working in the International Computer Science Institute's Artificial Intelligence group. He received his doctoral degree in 2010 as a member of the Max Planck Institute for Informatics in Germany and has published several award-winning papers on Web Mining and Natural Language Processing.

For more information, please refer to http://www.icsi.berkeley.edu/~demelo/ .