IEEE International Conference on Distributed Computing Systems, ICDCS 2017


Article Details
Title: Learning from Failure Across Multiple Clusters: A Trace-Driven Approach to Understanding, Predicting, and Mitigating Job Terminations
Article URLs:
Alternative Article URLs:
Authors: Nosayba El-Sayed
  • MIT, Computer Science and Artificial Intelligence Lab
Hongyu Zhu
  • University of Toronto, Department of Computer Science
Bianca Schroeder
  • University of Toronto, Department of Computer Science
Sharing: Unknown
Verification: Authors have not verified information
Artifact Evaluation Badge: none
Artifact URLs:
Artifact Correspondence Email Addresses:
NSF Award Numbers:
DBLP Key: conf/icdcs/El-SayedZS17
Author Comments:

Discuss this paper and its artifacts below