Modern applications in the cloud are subject to sporadic changes due to operational activities such as upgrade, redeployment, and on-demand scaling. They are also subject to interferences from other simultaneous operations. Increasing the dependability of systems during sporadic changes is non- trivial, particularly since traditional anomaly-detection-based diagnosis techniques are less effective during the “sporadic” operation periods: a wide range of legitimate changes confound anomaly diagnosis and make baseline establishment for “normal” operation difficult. The increasing frequency of these sporadic operations (e.g. due to continuous deployment) is exacerbating the problem. Diagnosing failures during sporadic operations relies heavily on logs, while log analysis challenges stemming from noisy, inconsistent and voluminous logs from multiple sources remain largely unsolved. We propose Process Oriented Dependability (POD)-Diagnosis, an approach that explicitly models these sporadic operations as processes. These models allow us to (i) determine orderly execution of the process, and (ii) use the process context to filter logs, trigger assertion evaluations, visit fault trees and perform on-demand assertion evaluation for online error diagnosis and root cause analysis. We evaluated the approach on rolling upgrade operations in Amazon Web Services (AWS) while performing other simultaneous operations.
Xiwei Xu(徐熙炜) is now a post-doctoral equivalent researcher at NICTA(National ICT Australia). She got her bachelor degree at 2007 from Nankai University, China, and PhD degree at 2011 from University of New South Wales, Australia. Her PhD research proposes a resource-oriented architecture for business process. Her current research interests includes software architecture, cloud computing, dependability and business process.