Improving the spatial prediction of soil organic carbon using environmental covariates selection: A comparison of a group of environmental covariates

Mojtaba Zeraatpisheh, Younes Garosi, Hamid Reza Owliaie, Shamsollah Ayoubi, Ruhollah Taghizadeh-Mehrjardi, Thomas Scholten, Ming Xu

Research output: Contribution to journalArticlepeer-review


In the digital soil mapping (DSM) framework, machine learning models quantify the relationship between soil observations and environmental covariates. Generally, the most commonly used covariates (MCC; e.g., topographic attributes and single-time remote sensing data, and legacy maps) were employed in DSM studies. Additionally, remote sensing time-series (RST) data can provide useful information for soil mapping. Therefore, the main aims of the study are to compare the MCC, the monthly Sentinel-2 time-series of vegetation indices dataset, and the combination of datasets (MCC + RST) for soil organic carbon (SOC) prediction in an arid agroecosystem in Iran. We used different machine learning algorithms, including random forest (RF), Cubist, support vector machine (SVM), and partial least square regression (PLSR). A total of 237 soil samples at 0–20 cm depths were collected. The 5-fold cross-validation technique was used to evaluate the modeling performance, and 50 bootstrap models were applied to quantify the prediction uncertainty. The results showed that the Cubist model performed the best with the MCC dataset (R2 = 0.35, RMSE = 0.26%) and the combined dataset of MCC and RST (R2 = 0.33, RMSE = 0.27%), while the RF model showed better results for the RST dataset (R2 = 0.10, RMSE = 0.31%). Soil properties could explain the SOC variation in MCC and combined datasets (66.35% and 50.82%, respectively), while NDVI was the most controlling factor in the RST (50.22%). Accordingly, results showed that time-series vegetation indices did not have enough potential to increase SOC prediction accuracy. However, the combination of MCC and RST datasets produced SOC spatial maps with lower uncertainty. Therefore, future studies are required to explicitly explain the efficiency of time-series remotely-sensed data and their interrelationship with environmental covariates to predict SOC in arid regions with low SOC content.

Original languageEnglish (US)
Article number105723
StatePublished - Jan 2022

All Science Journal Classification (ASJC) codes

  • Earth-Surface Processes


  • Environmental covariates
  • Machine learning
  • Soil organic carbon
  • Spatial prediction
  • Time-series vegetation indices
  • Uncertainty


Dive into the research topics of 'Improving the spatial prediction of soil organic carbon using environmental covariates selection: A comparison of a group of environmental covariates'. Together they form a unique fingerprint.

Cite this