| Title: |
Learning from Failure Across Multiple Clusters: A Trace-Driven Approach to Understanding, Predicting, and Mitigating Job Terminations |
| Article URLs: |
|
| Alternative Article URLs: |
|
| Authors: |
Nosayba El-Sayed |
-
MIT, Computer Science and Artificial Intelligence Lab
|
| Hongyu Zhu |
-
University of Toronto, Department of Computer Science
|
| Bianca Schroeder |
-
University of Toronto, Department of Computer Science
|
| Sharing: |
Unknown
|
| Verification: |
Authors have
not verified
information
|
| Artifact Evaluation Badge: |
none
|
| Artifact URLs: |
|
| Artifact Correspondence Email Addresses: |
|
| NSF Award Numbers: |
|
| DBLP Key: |
conf/icdcs/El-SayedZS17
|
| Author Comments: |
|