Bridging the gap between cloud infrastructure and Production Data Science


Over the last decade, data scientists have mastered the process of building machine learning models. The next frontier is putting these models into production: using them to consistently generate high-quality predictions for use by people, services, and analyses.

Cloud computing infrastructure is crucial to delivering this new frontier. Unfortunately, existing offerings fall short of the challenge of Production Data Science. In this talk, I will cover some of the important promises and weaknesses of current cloud offerings, and describe research from Berkeley's RISELab and the resulting open source Aqueduct system, which are putting Production Data Science at the fingertips of anyone working with data and models.