Publications

There is a high risk that we would find ourselves reliant on systems that are fragile and cannot deliver the services and performance for which they were engineered. Previous research has focused on developing models to study complex information technology infrastructures and gain better understanding of how their structure impact their emergent behaviors. Shadow Computing takes the prospective that assumes that unpredictable changes to system performance will occur and actively seek to develop uniform adaptive frameworks, methodologies and tools to achieve system resilience in future large-scale computing systems. In this context, system resilience characterizes the ability of the system to mitigate the impact of and dynamically adapt to changing conditions in order to provide appropriate QoS support for diverse types of applications and services in heterogeneous computing environments.

Publications

  • Zaeem Hussain, Taieb Znati, and Rami Melhem. "Partial Redundancy in HPC Systems withNon-Uniform Node Reliabilities." To appear in Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC18), Novermber 2018.
  • Xiaolong Cui, Zaeem Hussain, Taieb Znati, and Rami Melhem. "A Systematic Fault-tolerant Computational Model for Both Crash Failures and Silent Data Corruption." 21st Conference on Innovation in Clouds, Internet and Networks (ICIN 2018).
  • Xiaolong Cui, Taieb Znati, and Rami Melhem. "Rejuvenating Shadows: Fault Tolerance with Forward Recovery." The 19th IEEE International Conference on High Performance Computing and Communications (HPCC 2017), Bangkok, Thailand, 2017.
  • Xiaolong Cui, Taieb Znati, and Rami Melhem. "Adaptive and Power-Aware Resilience for Extreme-scale Computing." in Frontiers in Signal Processing, Volume 1, Number 1, July 2017.
  • Xiaolong Cui, Taieb Znati, and Rami Melhem. "Adaptive and Power-Aware Resilience for Extreme-scale Computing." The 16th IEEE International Conference on Scalable Computing and Communications (ScalCom2016). Toulouse, France, July 18-21, 2016 [PDF]
  • Xiaolong Cui, Bryan Mills, Taieb Znati, and Rami Melhem. "Shadow Replication: An Energy-Aware, Fault-Tolerant Computational Model for Green Cloud Computing." Energies 7, no. 8 (2014): 5151-5176. [PDF]
  • Xiaolong Cui, Bryan Mills, Taieb Znati and Rami Melhem. Shadows on the Cloud: An energy-aware, profit maximizing resilience framework for cloud computing. International Conference on Cloud Computing and Services Science, CLOSER 2014. April 3-5, 2014. [PDF]
  • Bryan Mills, Taieb Znati, Rami Melhem, Ryan E. Grant and Kurt B. Ferreira. Energy Consumption of Resilience Mechanisms in Large Scale Systems. Parallel, Distributed and Network-Based Processing (PDP), 22st Euromicro International Conference. Feburary 12-14, 2014. [PDF]
  • Bryan Mills, Taieb Znati and Rami Melhem. Shadow Computing: An Energy-Aware Fault Tolerant Computing Model. In Proceedings of the International Conference on Computing, Networking and Communications (ICNC). Feburary 3-6, 2014. [PDF]
  • Bryan Mills Ryan E. Grant, Kurt B. Ferreira and Rolf Riesen. Evaluating Energy Savings for Checkpoint/Restart. First International Workshop on Energy Efficient Supercomputing (E2SC) in conjunction with SC13: The International Conference for High Performance Computing, Networking, Storage and Analysis. November 17, 2013. [PDF, Presentation]