We present and evaluate a bigdata system for large-scale malware detection that integrates machine learning with expert reviewers, treating reviewers as a limited labeling resource. The system consists of three major components: a big data behavioral analytics platform for malware feature engineering, an ensemble of supervised learning models, a mechanism to obtain feedback from expert reviewers. We demonstrate that even in small numbers, reviewers can vastly improve the system’s ability to keep pace with evolving threats.
Ling Huang is the Director of Data Science at DataVisor, Inc. His research and engineering background are on big data, machine learning, computer vision and security analytics, especially on large-scale machine learning pipelines for user categorization, risk modeling, image processing, natural language processing, fake account/spam/fraud detection, malware classification, etc. Ling Huang was a senior research scientist in affiliate with Intel ISTC on Secure Computing from May 2011 to May 2014, and was a research scientist at Intel Labs Berkeley from October 2007 to May 2011. He pursued his Ph.D. in Computer Science atUniversity of California at Berkeley from 2002 to 2007. During his Ph.D. study, he was affiliated with RadLab.