Cognition is essential for many complex real-world tasks; it subsumes critical capabilities regarding knowledge, memory, and reasoning. We present several approaches that address multiple challenges of machine cognition—modeling long-range memory, unifying autoregressive and auto-encoding methods, multi-hop reasoning, and semi-supervised learning on complex structured data. Empirically, Transformer-XL became the first attention model to fully surpass LSTMs on language modeling; XLNet outperformed BERT on 20 tasks under fair comparison and achieved state-of-the-art results on 18 tasks.
Zhilin Yang obtained his bachelor’s degree from Tsinghua University and a PhD degree from Carnegie Mellon University. His research obtained 5,000+ Google Scholar citations and achieved state-of-the-art results on over 30 benchmarks including natural language understanding, text classification, question answering, and semi-supervised learning. He the first author of XLNet, which was accepted as a NeurIPS 2019 Oral (top 0.5%) and has been the most cited peer-reviewed NLP paper of 2019. He co-invented Transformer-XL, which has been the most cited paper of ACL 2019. He is an Nvidia Fellow, a Siebel Scholar, and a Young Scientist of Beijing Academy of AI; he also received the Forbes China 30 under 30 award, and the Nvidia pioneering research award.