Console logs are textual data that systems print, mostly for debugging purposes. Despite the ubiquity of these logs, they are rarely utilized because of their size and hard-to-process free text style.
First, we show that we can mine legacy text logs as is, in a fully automated way. Using a combination of program analysis, information retrieval, data mining and machine learning techniques, we are able to automatically detect the few messages which are likely to suggest problems from millions to billions of lines of logs.
Then we take our approach further. We reinvent the decades-old logging library. With our new logging library, without changing any legacy programming interface, we can export structured data and perform in-process filtering of log messages. These changes not only significantly improve logging performance (nearly 10x faster), but also pave the way for logging to become the single debugging instrumentation for large scale distributed systems.
This talk is based on my Ph.D. dissertation, the following papers and my recent work at Google.
"Large-scale system problem detection by mining console logs" (SOSP'09) "Online system problem detection by mining patterns of console logs" (ICDM'09) "Experience on mining Google's production console logs" (SLAML'10)
Wei Xu received his Ph.D. in computer science from UC Berkeley in 2010. He is co-advised by Prof. David Patterson and Armando Fox. He did his undergraduate study in Tsinghua and UPenn and master’s at UC Berkeley. He is currently an engineer at Google, where he works on Google's debugging infrastructures.