清华大学交叉信息研究院

Mitigating the Risks of Large Language Model Deployments for a Trustworthy Cyberspace

演讲人： Tianxing He University of Washington
时间： 2024-04-24 10:00-2024-04-24 11:00
地点：FIT 1-222
内容：

Large language models (LLMs) have ushered in transformative possibilities and critical challenges within our cyberspace. While offering innovative applications, they also introduce substantial AI safety concerns. In my recent research, I employ a comprehensive approach encompassing both red teaming, involving meticulous examination of LLM-based systems to uncover potential vulnerabilities, and blue teaming, entailing the development of algorithms and protocols to enhance system robustness.

In this talk, I will delve into three recent projects focused on evaluation, detection, and privacy. (1) Can we trust LLMs as reliable natural language generation (NLG) evaluation metrics? We subject popular LLM-based metrics to extensive stress tests, uncovering significant blind spots. Our findings illuminate clear avenues for enhancing the robustness of these metrics. (2) How can we ensure robust detection of machine-generated text? We introduce SemStamp, a semantic watermark algorithm that performs rejection sampling in the semantic space during LLM generation. The inherent properties of semantic mapping render the watermark resilient to paraphrasing attacks. (3) How do we protect the decoding-time privacy in prompted generation with online services like ChatGPT? The current paradigm gives zero option to users who want to keep the generated text to themselves. We propose LatticeGen, a cooperative framework in which the server still handles most of the computation while the client controls the sampling operation. The key idea is that the true generate sequence is mixed with noise tokens by the client and hidden in a noised lattice. To wrap up, I will outline future directions in the realm of AI safety, addressing the evolving challenges and opportunities that lie ahead.

个人简介:

Tianxing He is a postdoctoral researcher working with Prof. Yulia Tsvetkov at the University of Washington. His research is focused on natural language generation (NLG) with large language models (LLMs). He did his Ph.D. at MIT CSAIL under the guidance of Prof. James Glass, where he worked towards a better understanding of LM generation. He received his Master degree from the SpeechLab at Shanghai Jiao Tong University (SJTU) with Prof. Kai Yu, and a Bachelor degree from ACM class at SJTU. Tianxing currently works on developing algorithms or protocols for a trustworthy cyberspace in the era of large language models. He is also interested in the critical domains of monitoring, detecting, and mitigating various behaviors exhibited by language models under diverse scenarios.

Tianxing’s research is recognized with accolades, including the UW Postdoc Research Award, the CCF-Tencent Rhino-Bird Young Faculty Open Research Fund, and The ORACLE Project Award. He is a recipient of the Best Paper Award at NeurIPS-ENLSP 2022.