| ABSTRACT Infrastructure as a Service (IaaS) has become an increasingly popular type of service for both private and public clouds. The virtual infrastructures that enable IaaS support multitenancy by multiplexing the computational resources of data centers and result in substantial reductions in operational costs. Since hardware and software failures occur on a routine basis in large-scale systems, it is imperative for cloud providers to offer various failure recovery options for distributed services hosted on such infrastructures. In this article we present GENI-VIOLIN, a new cloud capability that can checkpoint a stateful distributed service while incurring very low overhead. The unique aspect of GENI-VIOLIN compared to previous work is that GENI-VIOLIN exploits programmable OpenFlow switches to provide checkpointing services in the network, thereby requiring minimal changes to the end host virtualization framework. We have developed a prototype of GENI-VIOLIN using the GENI infrastructure, and have demonstrated GENI-VIOLIN’s checkpoint and restore capability across multiple GENI sites.
References
[1] J. Dean, “Underneath the Covers at Google: Current Systems and Future Directions,” 2008 Google I/O, San Francisco, CA, May 2008,
http://sites.google.com/site/io/underneath-the-covers-at-google-current-systems-and-future-directions.
[2] G. DeCandia et al., “Dynamo: Amazon’s Highly Available Key-value Store,” Proc. SOSP’07, Stevenson, Washington, Oc. 2007.
[3] A. Kangarlou, P. Eugster, and D. Xu, “VNsnap: Taking Snapshots of Virtual Networked Environments with Minimal
Downtime,” Proc. 39th IEEE/IFIP Int’l. Conf. Dependable Systems and Networks (DSN-DCCS 2009), Estoril, Portugal, June 2009.
[4] B. Cully et al., “Remus: High Availability via Asynchronous Virtual Machine Replication,” Proc. NSDI’08, San Francisco, CA, Apr. 2008.
[5] X. Jiang and D. Xu, “VIOLIN: Virtual Internetworking on Overlay INfrastructure,” Technical Report CSD TR 03-027, Purdue University, 2003.
[6] A. Burtsev et al., “Transparent Checkpoints of Closed Distributed Systems in Emulab,” ACM EuroSys 2009.
[7] N. McKeown et al., “OpenFlow: Enabling Innovation in Campus Networks,” ACM SIGCOMM Comp. Commun. Review, Mar. 2008.
[8] The Global Environment for Network Innovations (GENI), http://www.geni.net/.
[9] P. Barham et al., “Xen and the Art of Virtualization,” ACM SOSP, 2003.
[10] O. Laadan, D. Phung, and J. Nieh, “Transparent Checkpoint/Restart of Distributed Applications on Commodity
Clusters,” IEEE Int’l. Conf. Cluster Computing, 2005.
[11] J. F. Ruscio, M. A. Heffner, and S. Varadarajan, “DejaVu: Transparent User-Level Checkpointing, Migration, and Recovery for
Distributed Systems,” IPDPS 2007.
[12] S. Sankaran et al., “The LAM/MPI Checkpoint/Restart Framework: System Initiated Checkpointing,” Proc. LACSI Symp., Sante Fe, 2003.
[13] D. P. Scarpazza et al., “Transparent System-Level Migration of PGAS Applications using Xen on Infiniband,” Proc. IEEE Int’l. Conf. Cluster Computing, 2007.
[14] F. Mattern, “Efficient Algorithms for Distributed Snapshots and Global Virtual Time
Approximation,” J. Parallel and Distributed Computing, 1993.
[15] Persistence of Vision Raytracer (POV-Ray), http://www.povray.org/.
[16] GENI-VIOLIN GEC9 Demo, http://vimeo.com/16535013/.
[17] Iperf, http://sourceforge.net/projects/iperf/.
|