Group:Research Talk
Title: Research Talk
Speaker: Jia Xu DFKI, Saarbrucken, Germany
Time: 2011-09-23 10:00-2011-09-23 11:00
Venue: FIT 1-222


In the last decade, while statistical machine translation has advanced significantly, there is still much room for further improvements relating to many natural language processing tasks such as word segmentation, word alignment and parsing.

Human language is composed of sequences of meaningful units. These sequences can be words, phrases, sentences or even articles serving as basic elements in communication and components for computational modeling. However, in monolingual text some sequences are not naturally separated by delimiters, and in bilingual text both sequence boundaries and their corresponding translations can be unlabeled. This work addresses solutions of sequence segmentation and alignment for statistical machine translation, including the following topics: Chinese word segmentation, Phrase training, Parallel sentence exploitation, and Domain adaptation. Experimental results on state-of-the-art, large-scale Chinese-English tasks show that the training speed can be increased with a factor of four and each above mentioned method leads to an enhancement of the translation quality up to 6% relatively.  

Short Bio:

  Jia Xu has been a researcher in German Research Center for Artificial Intelligence (DFKI) since February 2010. She received her Ph.D. degree in Computer Science from RWTH-Aachen university in October 2010., where she worked as a research assistant in Hermann Neys group from 2003 to 2009, and the topic of her Ph.D. thesis is "Sequence Segmentation for Statistical Machine Translation". In 2007 and 2008, she joined IBMs Speech Group then Microsofts Natural Language Processing Group for internships. Her research has focused on machine translation, machine learning, natural language processing and computational linguistics.