TY - GEN
T1 - Environmental conditions and disk reliability in free-cooled datacenters
AU - Manousakis, Ioannis
AU - Sankar, Sriram
AU - McKnight, Gregg
AU - Nguyen, Thu D.
AU - Bianchini, Ricardo
PY - 2019/1/1
Y1 - 2019/1/1
N2 - Free cooling lowers datacenter costs significantly, but may also expose servers to higher and more variable temperatures and relative humidities. It is currently unclear whether these environmental conditions have a significant impact on hardware component reliability. Thus, in this paper, we use data from nine hyperscale datacenters to study the impact of environmental conditions on the reliability of server hardware, with a particular focus on disk drives and free cooling. Based on this study, we derive and validate a new model of disk lifetime as a function of environmental conditions. Furthermore, we quantify the tradeoffs between energy consumption, environmental conditions, component reliability, and datacenter costs. Finally, based on our analyses and model, we derive server and datacenter design lessons. We draw many interesting observations, including (1) relative humidity seems to have a dominant impact on component failures; (2) disk failures increase significantly when operating at high relative humidity, due to controller/adaptor malfunction; and (3) though higher relative humidity increases component failures, software availability techniques can mask them and enable free-cooled operation, resulting in significantly lower infrastructure and energy costs that far outweigh the cost of the extra component failures.
AB - Free cooling lowers datacenter costs significantly, but may also expose servers to higher and more variable temperatures and relative humidities. It is currently unclear whether these environmental conditions have a significant impact on hardware component reliability. Thus, in this paper, we use data from nine hyperscale datacenters to study the impact of environmental conditions on the reliability of server hardware, with a particular focus on disk drives and free cooling. Based on this study, we derive and validate a new model of disk lifetime as a function of environmental conditions. Furthermore, we quantify the tradeoffs between energy consumption, environmental conditions, component reliability, and datacenter costs. Finally, based on our analyses and model, we derive server and datacenter design lessons. We draw many interesting observations, including (1) relative humidity seems to have a dominant impact on component failures; (2) disk failures increase significantly when operating at high relative humidity, due to controller/adaptor malfunction; and (3) though higher relative humidity increases component failures, software availability techniques can mask them and enable free-cooled operation, resulting in significantly lower infrastructure and energy costs that far outweigh the cost of the extra component failures.
UR - http://www.scopus.com/inward/record.url?scp=85077185817&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85077185817&partnerID=8YFLogxK
M3 - Conference contribution
T3 - Proceedings of the 14th USENIX Conference on File and Storage Technologies, FAST 2016
SP - 53
EP - 65
BT - Proceedings of the 14th USENIX Conference on File and Storage Technologies, FAST 2016
PB - USENIX Association
T2 - 14th USENIX Conference on File and Storage Technologies, FAST 2016
Y2 - 22 February 2016 through 25 February 2016
ER -