Variable: fNR Ranger result Call: ranger(formulaString.lst[[j]], data = dfs, importance = "impurity", write.forest = TRUE, mtry = t.mrfX$bestTune$mtry, num.trees = 500) Type: Regression Number of trees: 500 Sample size: 1609 Number of independent variables: 377 Mtry: 8 Target node size: 5 Variable importance mode: impurity OOB prediction error: 52.92935 R squared: 0.4779264 OOB RMSE: 7.275 Variable importance: [,1] Maize_actualbaseline 2496.230 Temperature 2489.175 yGapMaize 2058.031 NCluster_19_AF_1km 2056.718 N_M_agg30cm_AF_1km 1995.018 TREL10 1786.434 M13NDVIALT 1779.627 fKR_MaizeTrials 1763.867 Maize_rainfed_low_baseline 1668.642 B13CHE3 1487.998 NMOD3avg 1411.571 CRFVOL_M_agg30cm_AF_1km 1342.659 Wdvi 1287.562 BARL10 1150.153 EXMOD5avg 1147.238 eXtreme Gradient Boosting 1609 samples 377 predictors No pre-processing Resampling: Cross-Validated (3 fold, repeated 1 times) Summary of sample sizes: 1072, 1073, 1073 Resampling results across tuning parameters: eta max_depth nrounds RMSE Rsquared 0.3 2 50 7.466455 0.4599793 0.3 2 100 7.468762 0.4599001 0.3 2 150 7.468891 0.4598904 0.3 3 50 7.466424 0.4601722 0.3 3 100 7.466591 0.4601599 0.3 3 150 7.466591 0.4601599 0.3 4 50 7.464963 0.4603550 0.3 4 100 7.464966 0.4603550 0.3 4 150 7.464966 0.4603550 0.3 5 50 7.463995 0.4604755 0.3 5 100 7.463995 0.4604755 0.3 5 150 7.463995 0.4604755 0.3 6 50 7.463817 0.4604981 0.3 6 100 7.463817 0.4604981 0.3 6 150 7.463817 0.4604981 0.3 7 50 7.464140 0.4604572 0.3 7 100 7.464140 0.4604572 0.3 7 150 7.464140 0.4604572 0.3 8 50 7.464747 0.4603819 0.3 8 100 7.464747 0.4603819 0.3 8 150 7.464747 0.4603819 0.4 2 50 7.467282 0.4600123 0.4 2 100 7.468354 0.4599518 0.4 2 150 7.468371 0.4599508 0.4 3 50 7.470465 0.4597072 0.4 3 100 7.470493 0.4597052 0.4 3 150 7.470493 0.4597052 0.4 4 50 7.466589 0.4601600 0.4 4 100 7.466589 0.4601600 0.4 4 150 7.466589 0.4601600 0.4 5 50 7.466374 0.4601857 0.4 5 100 7.466374 0.4601857 0.4 5 150 7.466374 0.4601857 0.4 6 50 7.466712 0.4601456 0.4 6 100 7.466712 0.4601456 0.4 6 150 7.466712 0.4601456 0.4 7 50 7.466464 0.4601750 0.4 7 100 7.466464 0.4601750 0.4 7 150 7.466464 0.4601750 0.4 8 50 7.465368 0.4603061 0.4 8 100 7.465368 0.4603061 0.4 8 150 7.465368 0.4603061 0.5 2 50 7.470248 0.4597137 0.5 2 100 7.470508 0.4597036 0.5 2 150 7.470508 0.4597036 0.5 3 50 7.467243 0.4600829 0.5 3 100 7.467243 0.4600829 0.5 3 150 7.467243 0.4600829 0.5 4 50 7.469227 0.4598514 0.5 4 100 7.469227 0.4598514 0.5 4 150 7.469227 0.4598514 0.5 5 50 7.468704 0.4599121 0.5 5 100 7.468704 0.4599121 0.5 5 150 7.468704 0.4599121 0.5 6 50 7.468732 0.4599090 0.5 6 100 7.468732 0.4599090 0.5 6 150 7.468732 0.4599090 0.5 7 50 7.467291 0.4600773 0.5 7 100 7.467291 0.4600773 0.5 7 150 7.467291 0.4600773 0.5 8 50 7.468975 0.4598805 0.5 8 100 7.468975 0.4598805 0.5 8 150 7.468975 0.4598805 Tuning parameter 'gamma' was held constant at a value of 0 Tuning parameter 'colsample_bytree' was held constant at a value of 0.8 Tuning parameter 'min_child_weight' was held constant at a value of 1 RMSE was used to select the optimal model using the smallest value. The final values used for the model were nrounds = 50, max_depth = 6, eta = 0.3, gamma = 0, colsample_bytree = 0.8 and min_child_weight = 1. RMSE: 7.464 R2: 0.46 XGBoost variable importance: Feature Gain Cover Frequency 1: N_M_agg30cm_AF_1km 0.7175649293 0.082309922 0.055555556 2: Maize_actualbaseline 0.1612172842 0.090686596 0.067901235 3: af_agg_30cm_PWP__M_1km 0.0441532718 0.003478869 0.027777778 4: Ca_M_agg30cm_AF_1km 0.0338556833 0.037547893 0.030864198 5: af_BDRICM_T__M_1km 0.0238063913 0.002138601 0.027777778 6: NCluster_6_AF_1km 0.0055544720 0.013551273 0.009259259 7: BIO12ALT 0.0052299159 0.013793103 0.015432099 8: BIO1ALT 0.0043336942 0.155995513 0.104938272 9: GAEZ_NPP 0.0022923609 0.068330347 0.046296296 10: rElevIndex 0.0005873102 0.018923998 0.018518519 11: M13RB3ALT 0.0004000659 0.006159404 0.006172840 12: NEGMRG5 0.0002357387 0.023000160 0.018518519 13: Hypsclassc2 0.0001792231 0.003787713 0.003086420 14: GAEZ_LGP 0.0001505000 0.026295471 0.018518519 15: NCluster_19_AF_1km 0.0001008603 0.004688024 0.003086420 Ensemble validation RMSE: 7.359 R2: 0.467 -------------------------------------- Variable: fPR Ranger result Call: ranger(formulaString.lst[[j]], data = dfs, importance = "impurity", write.forest = TRUE, mtry = t.mrfX$bestTune$mtry, num.trees = 500) Type: Regression Number of trees: 500 Sample size: 1609 Number of independent variables: 377 Mtry: 12 Target node size: 5 Variable importance mode: impurity OOB prediction error: 6.298889 R squared: 0.8960989 OOB RMSE: 2.51 Variable importance: [,1] Maize_rainfed_low_baseline 2880.071 Maize_rainfed_intermed_baseline 2863.303 Maize_intermed 2786.694 Maize_rainfed_high_baseline 2198.718 Temperature 2047.136 NMOD3avg 2012.677 NCluster_M_AF_1km 1772.598 yFertilised_MaizeTrials 1754.464 B13CHE3 1425.807 GAEZ_LGP 1422.048 BIO12ALT 1403.980 GAEZ_ratioP_PETan 1377.687 ENAX_M_agg30cm_AF_1km 1328.489 Riclassc2 1280.940 M13RB3A04 1278.782 eXtreme Gradient Boosting 1609 samples 377 predictors No pre-processing Resampling: Cross-Validated (3 fold, repeated 1 times) Summary of sample sizes: 1073, 1072, 1073 Resampling results across tuning parameters: eta max_depth nrounds RMSE Rsquared 0.3 2 50 2.468647 0.8993344 0.3 2 100 2.469280 0.8992931 0.3 2 150 2.469294 0.8992920 0.3 3 50 2.469120 0.8993015 0.3 3 100 2.469136 0.8993002 0.3 3 150 2.469136 0.8993002 0.3 4 50 2.471195 0.8991524 0.3 4 100 2.471195 0.8991524 0.3 4 150 2.471195 0.8991524 0.3 5 50 2.469137 0.8993011 0.3 5 100 2.469138 0.8993010 0.3 5 150 2.469138 0.8993010 0.3 6 50 2.469156 0.8992978 0.3 6 100 2.469156 0.8992978 0.3 6 150 2.469156 0.8992978 0.3 7 50 2.469528 0.8992759 0.3 7 100 2.469528 0.8992759 0.3 7 150 2.469528 0.8992759 0.3 8 50 2.469175 0.8992996 0.3 8 100 2.469175 0.8992996 0.3 8 150 2.469175 0.8992996 0.4 2 50 2.468663 0.8993356 0.4 2 100 2.469193 0.8992937 0.4 2 150 2.469205 0.8992927 0.4 3 50 2.469153 0.8992981 0.4 3 100 2.469153 0.8992980 0.4 3 150 2.469153 0.8992980 0.4 4 50 2.469187 0.8992988 0.4 4 100 2.469187 0.8992988 0.4 4 150 2.469187 0.8992988 0.4 5 50 2.469136 0.8993011 0.4 5 100 2.469136 0.8993011 0.4 5 150 2.469136 0.8993011 0.4 6 50 2.470432 0.8992097 0.4 6 100 2.470432 0.8992097 0.4 6 150 2.470432 0.8992097 0.4 7 50 2.469135 0.8993008 0.4 7 100 2.469135 0.8993008 0.4 7 150 2.469135 0.8993008 0.4 8 50 2.469135 0.8993009 0.4 8 100 2.469135 0.8993009 0.4 8 150 2.469135 0.8993009 0.5 2 50 2.469163 0.8992983 0.5 2 100 2.469142 0.8993008 0.5 2 150 2.469142 0.8993008 0.5 3 50 2.469144 0.8993008 0.5 3 100 2.469144 0.8993008 0.5 3 150 2.469144 0.8993008 0.5 4 50 2.470707 0.8991891 0.5 4 100 2.470707 0.8991891 0.5 4 150 2.470707 0.8991891 0.5 5 50 2.470618 0.8991958 0.5 5 100 2.470618 0.8991958 0.5 5 150 2.470618 0.8991958 0.5 6 50 2.470749 0.8991860 0.5 6 100 2.470749 0.8991860 0.5 6 150 2.470749 0.8991860 0.5 7 50 2.470748 0.8991860 0.5 7 100 2.470748 0.8991860 0.5 7 150 2.470748 0.8991860 0.5 8 50 2.470790 0.8991829 0.5 8 100 2.470790 0.8991829 0.5 8 150 2.470790 0.8991829 Tuning parameter 'gamma' was held constant at a value of 0 Tuning parameter 'colsample_bytree' was held constant at a value of 0.8 Tuning parameter 'min_child_weight' was held constant at a value of 1 RMSE was used to select the optimal model using the smallest value. The final values used for the model were nrounds = 50, max_depth = 2, eta = 0.3, gamma = 0, colsample_bytree = 0.8 and min_child_weight = 1. RMSE: 2.469 R2: 0.899 XGBoost variable importance: Feature Gain Cover Frequency 1: Maize_intermed 0.513239861 0.030032104 0.021582734 2: yFertilised_MaizeTrials 0.106180786 0.020021403 0.014388489 3: TMNMOD3 0.094036893 0.008106864 0.007194245 4: af_agg_30cm_PWP__M_1km 0.082627959 0.009475636 0.043165468 5: Fcover 0.037516859 0.008106864 0.007194245 6: NMOD3avg 0.032873693 0.016997661 0.014388489 7: Al_M_agg30cm_AF_1km 0.031638484 0.043514509 0.035971223 8: VBFMRG5 0.020182384 0.044341994 0.043165468 9: M13RB3A04 0.014447438 0.037236822 0.028776978 10: fNR_MaizeTrials 0.014145714 0.024420138 0.021582734 11: AAIavg_GYGA 0.014033564 0.015827983 0.093525180 12: Fe_M_agg30cm_AF_1km 0.007762819 0.006196805 0.007194245 13: ECN_M_agg30cm_AF_1km 0.006663289 0.047670599 0.043165468 14: ENTENV3 0.004255023 0.006196805 0.007194245 15: Riclassc2 0.003162802 0.010010701 0.007194245 Ensemble validation RMSE: 2.486 R2: 0.898 -------------------------------------- Variable: fKR Ranger result Call: ranger(formulaString.lst[[j]], data = dfs, importance = "impurity", write.forest = TRUE, mtry = t.mrfX$bestTune$mtry, num.trees = 500) Type: Regression Number of trees: 500 Sample size: 1609 Number of independent variables: 377 Mtry: 12 Target node size: 5 Variable importance mode: impurity OOB prediction error: 23.37602 R squared: 0.7844417 OOB RMSE: 4.835 Variable importance: [,1] N_M_agg30cm_AF_1km 4176.130 M43BSALT 3870.489 BIO12ALT 3779.785 M43WSALT 3300.145 Water_balance 3078.378 GAEZ_ET 3062.449 NCluster_15_AF_1km 2639.289 GAEZ_ratioP_PETan 2508.024 Fapar 2481.894 Maize_actualbaseline 2254.225 NCluster_14_AF_1km 2239.120 Na_M_agg30cm_AF_1km 2036.194 M13RB3A04 1996.795 M13RB1A04 1959.919 Temperature 1921.675 eXtreme Gradient Boosting 1609 samples 377 predictors No pre-processing Resampling: Cross-Validated (3 fold, repeated 1 times) Summary of sample sizes: 1072, 1073, 1073 Resampling results across tuning parameters: eta max_depth nrounds RMSE Rsquared 0.3 2 50 4.728503 0.7931916 0.3 2 100 4.728472 0.7932188 0.3 2 150 4.728464 0.7932201 0.3 3 50 4.708428 0.7950390 0.3 3 100 4.708424 0.7950397 0.3 3 150 4.708424 0.7950397 0.3 4 50 4.673431 0.7981652 0.3 4 100 4.673431 0.7981653 0.3 4 150 4.673431 0.7981652 0.3 5 50 4.675167 0.7980086 0.3 5 100 4.675168 0.7980085 0.3 5 150 4.675168 0.7980085 0.3 6 50 4.702670 0.7955508 0.3 6 100 4.702670 0.7955508 0.3 6 150 4.702670 0.7955508 0.3 7 50 4.702174 0.7955971 0.3 7 100 4.702174 0.7955971 0.3 7 150 4.702174 0.7955971 0.3 8 50 4.735892 0.7925356 0.3 8 100 4.735892 0.7925356 0.3 8 150 4.735892 0.7925356 0.4 2 50 4.724907 0.7935337 0.4 2 100 4.724966 0.7935311 0.4 2 150 4.724969 0.7935309 0.4 3 50 4.665666 0.7988617 0.4 3 100 4.665665 0.7988618 0.4 3 150 4.665665 0.7988618 0.4 4 50 4.745070 0.7916968 0.4 4 100 4.745070 0.7916968 0.4 4 150 4.745070 0.7916968 0.4 5 50 4.744341 0.7917636 0.4 5 100 4.744340 0.7917636 0.4 5 150 4.744340 0.7917636 0.4 6 50 4.658014 0.7995046 0.4 6 100 4.658015 0.7995045 0.4 6 150 4.658015 0.7995045 0.4 7 50 4.659652 0.7993644 0.4 7 100 4.659652 0.7993644 0.4 7 150 4.659652 0.7993644 0.4 8 50 4.670196 0.7984480 0.4 8 100 4.670196 0.7984479 0.4 8 150 4.670196 0.7984480 0.5 2 50 4.704471 0.7953875 0.5 2 100 4.704473 0.7953876 0.5 2 150 4.704473 0.7953876 0.5 3 50 4.716833 0.7942820 0.5 3 100 4.716833 0.7942820 0.5 3 150 4.716833 0.7942820 0.5 4 50 4.710352 0.7948575 0.5 4 100 4.710352 0.7948575 0.5 4 150 4.710352 0.7948575 0.5 5 50 4.697645 0.7960029 0.5 5 100 4.697645 0.7960029 0.5 5 150 4.697645 0.7960029 0.5 6 50 4.695188 0.7962235 0.5 6 100 4.695188 0.7962235 0.5 6 150 4.695188 0.7962235 0.5 7 50 4.692250 0.7964894 0.5 7 100 4.692250 0.7964895 0.5 7 150 4.692250 0.7964894 0.5 8 50 4.695523 0.7961933 0.5 8 100 4.695523 0.7961933 0.5 8 150 4.695523 0.7961933 Tuning parameter 'gamma' was held constant at a value of 0 Tuning parameter 'colsample_bytree' was held constant at a value of 0.8 Tuning parameter 'min_child_weight' was held constant at a value of 1 RMSE was used to select the optimal model using the smallest value. The final values used for the model were nrounds = 50, max_depth = 6, eta = 0.4, gamma = 0, colsample_bytree = 0.8 and min_child_weight = 1. RMSE: 4.658 R2: 0.8 XGBoost variable importance: Feature Gain Cover Frequency 1: M43BSALT 0.4222968049 0.105887863 0.044444444 2: af_agg_30cm_PWP__M_1km 0.1334552376 0.044714832 0.098412698 3: N_M_agg30cm_AF_1km 0.1290987922 0.028966242 0.012698413 4: af_agg_30cm_AWCpF23__M_1km 0.1201419672 0.040778867 0.069841270 5: NCluster_16_AF_1km 0.0541516325 0.017508421 0.015873016 6: M13NDVIA08 0.0405435400 0.015365401 0.012698413 7: VBFMRG5 0.0378096861 0.013454188 0.009523810 8: GPMIMERGALT 0.0211034028 0.022593952 0.012698413 9: BIO12ALT 0.0121183086 0.026113613 0.025396825 10: B04CHE3 0.0095991869 0.001305681 0.003174603 11: AAIavg_GYGA 0.0063683155 0.047444461 0.180952381 12: EACKCL_M_agg30cm_AF_1km 0.0040912079 0.003746736 0.003174603 13: af_agg_30cm_TAWCpF23mm__M_1km 0.0038461873 0.004446883 0.019047619 14: af_BDRICM_T__M_1km 0.0031560159 0.026487341 0.034920635 15: ASSDAC3 0.0007223894 0.002431594 0.006349206 Ensemble validation RMSE: 4.771 R2: 0.79 -------------------------------------- Variable: fNRec Ranger result Call: ranger(formulaString.lst[[j]], data = dfs, importance = "impurity", write.forest = TRUE, mtry = t.mrfX$bestTune$mtry, num.trees = 500) Type: Regression Number of trees: 500 Sample size: 342 Number of independent variables: 377 Mtry: 6 Target node size: 5 Variable importance mode: impurity OOB prediction error: 183.3043 R squared: 0.5624974 OOB RMSE: 13.539 Variable importance: [,1] MY2LSTNALT_200207_201609 1665.048 NMSD3avg 1344.188 M43BNALT 1340.693 EXBX_M_agg30cm_AF_1km 1304.160 ESMOD5avg 1295.111 NIRL00 1262.276 M43WNALT 1251.487 Mg_M_agg30cm_AF_1km 1171.685 MAXENV3 1168.526 PCHE3avg 1146.929 PHIHOXagg0_30 1142.492 PET 1102.080 Zn_M_agg30cm_AF_1km 1101.973 M43BSALT 1084.598 SLTPPT_M_agg30cm_AF_1km 1076.490 eXtreme Gradient Boosting 342 samples 377 predictors No pre-processing Resampling: Cross-Validated (3 fold, repeated 1 times) Summary of sample sizes: 228, 229, 227 Resampling results across tuning parameters: eta max_depth nrounds RMSE Rsquared 0.3 2 50 15.69304 0.4408878 0.3 2 100 16.04145 0.4303199 0.3 2 150 16.09381 0.4284283 0.3 3 50 15.92755 0.4394346 0.3 3 100 15.99617 0.4375204 0.3 3 150 15.99784 0.4374841 0.3 4 50 16.63264 0.4045840 0.3 4 100 16.63700 0.4046203 0.3 4 150 16.63700 0.4046203 0.3 5 50 16.19296 0.4251565 0.3 5 100 16.19637 0.4250350 0.3 5 150 16.19637 0.4250350 0.3 6 50 15.86100 0.4438588 0.3 6 100 15.86167 0.4438441 0.3 6 150 15.86167 0.4438441 0.3 7 50 16.48920 0.4116423 0.3 7 100 16.48986 0.4116309 0.3 7 150 16.48986 0.4116309 0.3 8 50 16.24394 0.4286595 0.3 8 100 16.24432 0.4286500 0.3 8 150 16.24432 0.4286499 0.4 2 50 16.14368 0.4277594 0.4 2 100 16.30988 0.4232482 0.4 2 150 16.31896 0.4230695 0.4 3 50 16.48213 0.4112734 0.4 3 100 16.49722 0.4110062 0.4 3 150 16.49732 0.4110029 0.4 4 50 16.38465 0.4252411 0.4 4 100 16.38574 0.4252274 0.4 4 150 16.38574 0.4252274 0.4 5 50 16.32065 0.4246755 0.4 5 100 16.32073 0.4246733 0.4 5 150 16.32073 0.4246734 0.4 6 50 16.70585 0.4052524 0.4 6 100 16.70585 0.4052524 0.4 6 150 16.70585 0.4052524 0.4 7 50 16.57581 0.4143662 0.4 7 100 16.57581 0.4143661 0.4 7 150 16.57581 0.4143661 0.4 8 50 16.29582 0.4253926 0.4 8 100 16.29582 0.4253926 0.4 8 150 16.29582 0.4253926 0.5 2 50 16.54235 0.4084082 0.5 2 100 16.59956 0.4079367 0.5 2 150 16.60105 0.4079399 0.5 3 50 16.93139 0.4082677 0.5 3 100 16.93280 0.4082564 0.5 3 150 16.93280 0.4082564 0.5 4 50 16.68947 0.4067704 0.5 4 100 16.68952 0.4067686 0.5 4 150 16.68952 0.4067686 0.5 5 50 16.59215 0.4084111 0.5 5 100 16.59215 0.4084111 0.5 5 150 16.59215 0.4084111 0.5 6 50 16.91075 0.3957884 0.5 6 100 16.91075 0.3957884 0.5 6 150 16.91075 0.3957884 0.5 7 50 16.60684 0.4111097 0.5 7 100 16.60684 0.4111097 0.5 7 150 16.60684 0.4111096 0.5 8 50 16.61913 0.4103924 0.5 8 100 16.61913 0.4103923 0.5 8 150 16.61913 0.4103923 Tuning parameter 'gamma' was held constant at a value of 0 Tuning parameter 'colsample_bytree' was held constant at a value of 0.8 Tuning parameter 'min_child_weight' was held constant at a value of 1 RMSE was used to select the optimal model using the smallest value. The final values used for the model were nrounds = 50, max_depth = 2, eta = 0.3, gamma = 0, colsample_bytree = 0.8 and min_child_weight = 1. RMSE: 15.693 R2: 0.441 XGBoost variable importance: Feature Gain Cover Frequency 1: NMSD3avg 0.18389410 0.020042781 0.01459854 2: EXBX_M_agg30cm_AF_1km 0.08731109 0.099071117 0.08029197 3: M17GPPALTfill 0.07452251 0.029683242 0.02189781 4: M43BNALT 0.06748287 0.042693469 0.03649635 5: MY2LSTNALT_200207_201609 0.05373655 0.009728368 0.00729927 6: N_M_agg30cm_AF_1km 0.05084794 0.020013479 0.01459854 7: Mg_M_agg30cm_AF_1km 0.04601983 0.031060451 0.02919708 8: SNDPPT_M_agg30cm_AF_1km 0.04539648 0.075541360 0.05839416 9: af_BDRICM_T__M_1km 0.04230436 0.010226507 0.01459854 10: PCHE3avg 0.03502657 0.006417206 0.00729927 11: af_agg_30cm_PWP__M_1km 0.03140553 0.002666510 0.02919708 12: M17NPPALTfill 0.03011324 0.017991620 0.01459854 13: BIO1ALT 0.02942492 0.003985114 0.01459854 14: B07CHE3 0.02139770 0.009318135 0.00729927 15: OC_M_agg30cm_AF_1km 0.01791262 0.010021391 0.00729927 Ensemble validation RMSE: 14.202 R2: 0.521 -------------------------------------- Variable: fPRec Ranger result Call: ranger(formulaString.lst[[j]], data = dfs, importance = "impurity", write.forest = TRUE, mtry = t.mrfX$bestTune$mtry, num.trees = 500) Type: Regression Number of trees: 500 Sample size: 329 Number of independent variables: 377 Mtry: 4 Target node size: 5 Variable importance mode: impurity OOB prediction error: 239.3877 R squared: 0.01702034 OOB RMSE: 15.472 Variable importance: [,1] NCluster_1_AF_1km 799.9804 yFertilised_MaizeTrials 732.8413 Fapar 627.9467 Maize_intermed 615.2249 NCluster_19_AF_1km 607.8395 Temperature 496.7651 M13NDVIALT 494.7394 GAEZ_ET 414.1944 M43WVALT 402.3529 af_agg_ERZD_TAWCpF23mm__M_1km 399.1500 B04CHE3 382.3367 SLTPPT_M_agg30cm_AF_1km 379.1923 OC_M_agg30cm_AF_1km 370.8545 M13RB1ALT 358.2985 GAEZ_ratioP_PETan 356.3919 eXtreme Gradient Boosting 329 samples 377 predictors No pre-processing Resampling: Cross-Validated (3 fold, repeated 1 times) Summary of sample sizes: 220, 218, 220 Resampling results across tuning parameters: eta max_depth nrounds RMSE Rsquared 0.3 2 50 12.93657 0.2957510 0.3 2 100 12.95362 0.3009842 0.3 2 150 12.95581 0.3015603 0.3 3 50 12.95640 0.2909469 0.3 3 100 12.95925 0.2914828 0.3 3 150 12.95929 0.2914911 0.3 4 50 12.96118 0.2920697 0.3 4 100 12.96136 0.2922351 0.3 4 150 12.96136 0.2922351 0.3 5 50 12.97260 0.2905443 0.3 5 100 12.97289 0.2905601 0.3 5 150 12.97289 0.2905601 0.3 6 50 13.01740 0.2916135 0.3 6 100 13.01753 0.2916337 0.3 6 150 13.01753 0.2916336 0.3 7 50 12.94765 0.2924194 0.3 7 100 12.94772 0.2924356 0.3 7 150 12.94772 0.2924355 0.3 8 50 12.88189 0.3007675 0.3 8 100 12.88195 0.3007798 0.3 8 150 12.88195 0.3007798 0.4 2 50 12.97717 0.2892065 0.4 2 100 12.98009 0.2922065 0.4 2 150 12.98002 0.2923555 0.4 3 50 12.95421 0.2872261 0.4 3 100 12.95503 0.2874135 0.4 3 150 12.95503 0.2874136 0.4 4 50 13.09618 0.2752327 0.4 4 100 13.09623 0.2752394 0.4 4 150 13.09623 0.2752394 0.4 5 50 12.92401 0.2988236 0.4 5 100 12.92401 0.2988236 0.4 5 150 12.92401 0.2988236 0.4 6 50 13.00787 0.2884688 0.4 6 100 13.00787 0.2884688 0.4 6 150 13.00787 0.2884688 0.4 7 50 13.12959 0.2822993 0.4 7 100 13.12959 0.2822992 0.4 7 150 13.12959 0.2822992 0.4 8 50 12.97540 0.2903171 0.4 8 100 12.97540 0.2903171 0.4 8 150 12.97540 0.2903171 0.5 2 50 12.99867 0.2851138 0.5 2 100 13.00817 0.2860452 0.5 2 150 13.00835 0.2860827 0.5 3 50 12.86684 0.3044711 0.5 3 100 12.86677 0.3045260 0.5 3 150 12.86677 0.3045260 0.5 4 50 13.06966 0.2849273 0.5 4 100 13.06966 0.2849274 0.5 4 150 13.06966 0.2849274 0.5 5 50 12.96874 0.2918732 0.5 5 100 12.96874 0.2918732 0.5 5 150 12.96874 0.2918732 0.5 6 50 12.87235 0.3042481 0.5 6 100 12.87235 0.3042481 0.5 6 150 12.87235 0.3042481 0.5 7 50 12.95144 0.2928584 0.5 7 100 12.95144 0.2928584 0.5 7 150 12.95144 0.2928584 0.5 8 50 12.90123 0.3018186 0.5 8 100 12.90123 0.3018186 0.5 8 150 12.90123 0.3018186 Tuning parameter 'gamma' was held constant at a value of 0 Tuning parameter 'colsample_bytree' was held constant at a value of 0.8 Tuning parameter 'min_child_weight' was held constant at a value of 1 RMSE was used to select the optimal model using the smallest value. The final values used for the model were nrounds = 100, max_depth = 3, eta = 0.5, gamma = 0, colsample_bytree = 0.8 and min_child_weight = 1. RMSE: 12.867 R2: 0.305 XGBoost variable importance: Feature Gain Cover Frequency 1: NCluster_4_AF_1km 0.18063646 0.007791931 0.004889976 2: BIO1ALT 0.16140795 0.002348253 0.009779951 3: MY2LSTNALT_200207_201609 0.15122729 0.010899215 0.007334963 4: N_M_agg30cm_AF_1km 0.09999914 0.009736948 0.007334963 5: Lai_avg 0.04738343 0.004518608 0.007334963 6: af_agg_30cm_AWCpF23__M_1km 0.03826478 0.001956878 0.031784841 7: af_agg_ERZD_TAWCpF23mm__M_1km 0.03790422 0.014931569 0.026894866 8: SNDPPT_M_agg30cm_AF_1km 0.03608848 0.016615669 0.012224939 9: Na_M_agg30cm_AF_1km 0.03351725 0.032626485 0.022004890 10: C04GLC5 0.02470448 0.011895443 0.009779951 11: PHIHOXagg0_30 0.02043643 0.015121326 0.012224939 12: B02CHE3 0.01523718 0.001707820 0.004889976 13: LSTD_avgIRI_Jul2002_Sep2016_mosaicLAEA_celsius 0.01496596 0.006285728 0.007334963 14: Mg_M_agg30cm_AF_1km 0.01432932 0.008349344 0.007334963 15: Zn_M_agg30cm_AF_1km 0.01238282 0.019367158 0.012224939 Ensemble validation RMSE: 15.12 R2: 0.092 -------------------------------------- Variable: fKRec Ranger result Call: ranger(formulaString.lst[[j]], data = dfs, importance = "impurity", write.forest = TRUE, mtry = t.mrfX$bestTune$mtry, num.trees = 500) Type: Regression Number of trees: 500 Sample size: 321 Number of independent variables: 377 Mtry: 18 Target node size: 5 Variable importance mode: impurity OOB prediction error: 78.60111 R squared: 0.4466915 OOB RMSE: 8.866 Variable importance: [,1] M17NPPALTfill 900.2897 BIO12ALT 646.4957 Water_balance 597.8414 M17GPPALTfill 586.9508 M13RB3A04 525.8474 C03GLC5 499.1097 Na_M_agg30cm_AF_1km 486.2628 fNR_MaizeTrials 477.5874 TMOD3avg 477.3197 Maize_actualbaseline 474.3822 M13RB1A04 470.1359 rElev 431.3406 PCHE3avg 423.7902 MY2LSTNALT_200207_201609 413.7709 N_M_agg30cm_AF_1km 408.0438 eXtreme Gradient Boosting 321 samples 377 predictors No pre-processing Resampling: Cross-Validated (3 fold, repeated 1 times) Summary of sample sizes: 214, 214, 214 Resampling results across tuning parameters: eta max_depth nrounds RMSE Rsquared 0.3 2 50 9.582457 0.3879425 0.3 2 100 9.672625 0.3870234 0.3 2 150 9.687093 0.3867424 0.3 3 50 9.713569 0.3849472 0.3 3 100 9.731967 0.3844241 0.3 3 150 9.732232 0.3844264 0.3 4 50 9.614932 0.3958569 0.3 4 100 9.619883 0.3957049 0.3 4 150 9.619884 0.3957048 0.3 5 50 9.904408 0.3674064 0.3 5 100 9.905296 0.3673981 0.3 5 150 9.905296 0.3673982 0.3 6 50 9.600527 0.3983632 0.3 6 100 9.600927 0.3983603 0.3 6 150 9.600926 0.3983603 0.3 7 50 9.668462 0.3933408 0.3 7 100 9.668694 0.3933459 0.3 7 150 9.668693 0.3933460 0.3 8 50 9.712647 0.3881293 0.3 8 100 9.712792 0.3881291 0.3 8 150 9.712791 0.3881291 0.4 2 50 9.707711 0.3791795 0.4 2 100 9.764485 0.3792656 0.4 2 150 9.768063 0.3792799 0.4 3 50 9.878692 0.3720280 0.4 3 100 9.882199 0.3720057 0.4 3 150 9.882199 0.3720057 0.4 4 50 9.713264 0.3883246 0.4 4 100 9.714218 0.3882830 0.4 4 150 9.714219 0.3882830 0.4 5 50 9.745172 0.3843058 0.4 5 100 9.745219 0.3843025 0.4 5 150 9.745219 0.3843025 0.4 6 50 9.804873 0.3768881 0.4 6 100 9.804873 0.3768882 0.4 6 150 9.804872 0.3768882 0.4 7 50 9.704907 0.3863036 0.4 7 100 9.704906 0.3863036 0.4 7 150 9.704905 0.3863036 0.4 8 50 9.548711 0.4054544 0.4 8 100 9.548710 0.4054544 0.4 8 150 9.548709 0.4054544 0.5 2 50 9.512299 0.4028670 0.5 2 100 9.546995 0.4014161 0.5 2 150 9.547842 0.4014020 0.5 3 50 9.950823 0.3629515 0.5 3 100 9.951542 0.3629724 0.5 3 150 9.951541 0.3629724 0.5 4 50 9.714963 0.3865735 0.5 4 100 9.714966 0.3865734 0.5 4 150 9.714965 0.3865734 0.5 5 50 9.622641 0.3976325 0.5 5 100 9.622640 0.3976325 0.5 5 150 9.622639 0.3976326 0.5 6 50 9.951698 0.3653638 0.5 6 100 9.951699 0.3653638 0.5 6 150 9.951699 0.3653638 0.5 7 50 9.745037 0.3879025 0.5 7 100 9.745036 0.3879025 0.5 7 150 9.745035 0.3879026 0.5 8 50 9.764969 0.3858351 0.5 8 100 9.764968 0.3858352 0.5 8 150 9.764968 0.3858352 Tuning parameter 'gamma' was held constant at a value of 0 Tuning parameter 'colsample_bytree' was held constant at a value of 0.8 Tuning parameter 'min_child_weight' was held constant at a value of 1 RMSE was used to select the optimal model using the smallest value. The final values used for the model were nrounds = 50, max_depth = 2, eta = 0.5, gamma = 0, colsample_bytree = 0.8 and min_child_weight = 1. RMSE: 9.512 R2: 0.403 XGBoost variable importance: Feature Gain Cover Frequency 1: M17NPPALTfill 0.21888948 0.020028702 0.015151515 2: M13RB3A04 0.13435355 0.029887066 0.030303030 3: PCHE3avg 0.11156861 0.046047295 0.037878788 4: Al_M_agg30cm_AF_1km 0.06690715 0.082423410 0.068181818 5: NCluster_13_AF_1km 0.05432376 0.002994946 0.007575758 6: BLDFIE_M_agg30cm_AF_1km 0.04529630 0.008922443 0.022727273 7: BIO1ALT 0.04304335 0.026517751 0.022727273 8: EXMOD5avg 0.03482040 0.022025332 0.022727273 9: SNDPPT_M_agg30cm_AF_1km 0.02769834 0.008828851 0.007575758 10: B02CHE3 0.02651861 0.001185499 0.007575758 11: SW1L00 0.02644664 0.010014351 0.007575758 12: RANENV3 0.02532324 0.036344918 0.030303030 13: B_M_agg30cm_AF_1km 0.02164119 0.010419916 0.015151515 14: TMOD3avg 0.01403482 0.037779996 0.030303030 15: EXBX_M_agg30cm_AF_1km 0.01213258 0.027297685 0.022727273 Ensemble validation RMSE: 9.044 R2: 0.438 --------------------------------------