TY - JOUR
T1 - Estimating the occurrence of broken rails in commuter railroads with machine learning algorithms
AU - Kang, Di
AU - Dai, Junyan
AU - Liu, Xiang
AU - Bian, Zheyong
AU - Zaman, Asim
AU - Wang, Xin
N1 - Publisher Copyright:
© IMechE 2024.
PY - 2024/11
Y1 - 2024/11
N2 - Broken rail prevention is critical for ensuring track infrastructure safety. With the increasing availability of rail data, the opportunity for data-driven analyses emerges as a promising avenue for enhancing railroad safety. While previous research has predominantly concentrated on predicting broken rails within the context of freight railroads, the attention afforded to commuter railroads has been limited. To address this research gap, this paper presents an analytical modeling framework based on machine learning (ML) algorithms (including LightGBM, XGBoost, Random Forests, and Logistic Regression) to investigate the occurrence of broken rails on commuter rail segments. It leverages various features such as gradient, curvature, annual traffic, operational speed, and the history of prior rail defects. We use oversampling techniques, including ADASYN, random oversampling, and SMOTE, to address the issue of imbalanced data. This challenge arises due to the majority of commuter rail segments not experiencing any broken rails during the study period, resulting in a small sample size of broken rail instances. The findings indicate that, for the dataset employed in this study, LightGBM, in conjunction with random oversampling, exhibits superior performance. Based on the feature importance results, the critical factors influencing the prediction of broken rail occurrences on this commuter railroad are gradient, operational speed, and prior rail defects.
AB - Broken rail prevention is critical for ensuring track infrastructure safety. With the increasing availability of rail data, the opportunity for data-driven analyses emerges as a promising avenue for enhancing railroad safety. While previous research has predominantly concentrated on predicting broken rails within the context of freight railroads, the attention afforded to commuter railroads has been limited. To address this research gap, this paper presents an analytical modeling framework based on machine learning (ML) algorithms (including LightGBM, XGBoost, Random Forests, and Logistic Regression) to investigate the occurrence of broken rails on commuter rail segments. It leverages various features such as gradient, curvature, annual traffic, operational speed, and the history of prior rail defects. We use oversampling techniques, including ADASYN, random oversampling, and SMOTE, to address the issue of imbalanced data. This challenge arises due to the majority of commuter rail segments not experiencing any broken rails during the study period, resulting in a small sample size of broken rail instances. The findings indicate that, for the dataset employed in this study, LightGBM, in conjunction with random oversampling, exhibits superior performance. Based on the feature importance results, the critical factors influencing the prediction of broken rail occurrences on this commuter railroad are gradient, operational speed, and prior rail defects.
KW - Broken rails
KW - commuter railroad
KW - machine learning
KW - rail defects
UR - http://www.scopus.com/inward/record.url?scp=85203372718&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85203372718&partnerID=8YFLogxK
U2 - 10.1177/09544097241280848
DO - 10.1177/09544097241280848
M3 - Article
AN - SCOPUS:85203372718
SN - 0954-4097
VL - 238
SP - 1338
EP - 1350
JO - Proceedings of the Institution of Mechanical Engineers, Part F: Journal of Rail and Rapid Transit
JF - Proceedings of the Institution of Mechanical Engineers, Part F: Journal of Rail and Rapid Transit
IS - 10
ER -