SC 2018 Paper Accepted

Our recent submission to Supercomputing 2018 (SC18) has been accepted for publication. This paper removes the traditional assumption of uniform node failures in HPC systems and analytically studies the usefulness of partial replication without this assumption. Contributions include a novel result about the optimal selection and pairing of replicas, as well as an in-depth analysis of the scenarios in which partial replication provides the best performance under failures.

pdf