清华大学交叉信息研究院

Robust Language Understanding and Unstructured Data Summarization for Speech-bas

演讲人： Jingjing Liu 麻省理工学院
时间： 2012-04-17 10:00-2012-04-17 11:00
地点：FIT 1-222
课件下载：点击下载
内容：

There have been many assistant applications on mobile devices, which could help people obtain rich Web content such as user-generated data (e.g., reviews, posts, blogs, and tweets). However, online communities and social networks are expanding rapidly and it is impossible for people to browse and digest all the information manually. To help users obtain information more efficiently, an intuitive and personalized interface such as a spoken conversational system could be an ideal assistant, which engages a user in a continuous dialogue to garner the user’s interest and capture the user’s intent, assists the user by harvesting and summarizing Web data in a concise manner, and presents the aggregated information via natural human dialogue.

This talk will introduce our research on a universal framework for developing such speech-based interfaces. To interpret users’ intention from their spoken input correctly, we explored a lexicon modeling approach for spoken language understanding. A conversational movie search system will be presented, which parses the recognition hypothesis of a spoken query into semantic class labels using conditional random fields (CRFs), and searches an indexed movie database with the identified semantic labels. Topic models were applied for query expansion and vocabulary learning. A crowdsourcing platform was also utilized to automatically collect large-scale annotated data for incremental model training.

To aggregate Web content and present the highlighted information via natural language, we explored approaches to interpreting semantics and sentiment of usergenerated content. This talk will introduce a parse-and-paraphrase paradigm and a sentiment scoring mechanism for information extraction from unstructured data. A multilingual restaurant recommendation system will be demonstrated, which presents summarized customer-provided online reviews by spoken conversations with users. A medical drug-side-effect inquiry system will also be presented, which summarizes patient-provided drug reviews and automatically learns the correlated side effects that can be presented to users via speech interaction.

个人简介:

Dr. Jingjing Liu is a Research Scientist in the Spoken Language Systems (SLS) group at MIT Computer Science and Artificial Intelligence Laboratory (CSAIL). She received her Ph.D. degree in Electrical Engineering and Computer Science at MIT in 2012. Her primary research interests are in the areas of spoken dialogue systems, natural language processing and information retrieval. She is a technical reviewer of Journal of Computer Science and Technology, Multimedia Tools and Applications, EURASIP Journal on Audio, Speech, and Music Processing, ACM Transactions on Intelligent Systems and Technology, IEEE Transactions on Circuits and Systems for Video Technology, and IEEE Intelligent Systems. She also serves on the program committee of INTERSPEECH 2012, ACL-HLT 2011, and EMNLP 2010.