Most cloud facilities operate at very low utilization. Naive colocation of more workloads on the same server node leads to severe quality-of-service (QoS) degradation of cloud services. It is challenging to precisely and intelligently manage hardware resources, to improve resource efficiency and provide QoS guarantee at the same time.
In the first part of the talk, I'll briefly introduce two of my academic solutions to this problem. PARTIES (ASPLOS'19) and ReTail (HPCA'22) are two QoS-aware hardware resource managers that allow more aggressive colocation, and enable more fine-grained power management in the cloud. Then, I'll switch to an industrial perspective, and comment on these two solutions after working in industry for almost 2 years. Finally, I will talk about more open questions to practically improve resource efficiency, including unique challenges in public cloud, and opportunities in hardware design.