Variable: fNR Ranger result Call: ranger(formulaString.lst[[j]], data = dfs, importance = "impurity", write.forest = TRUE, mtry = t.mrfX$bestTune$mtry, num.trees = 500) Type: Regression Number of trees: 500 Sample size: 1609 Number of independent variables: 377 Mtry: 20 Target node size: 5 Variable importance mode: impurity OOB prediction error: 52.9601 R squared: 0.4776231 OOB RMSE: 7.277 Variable importance: [,1] Maize_actualbaseline 4046.119 yGapMaize 3907.206 N_M_agg30cm_AF_1km 3860.338 Temperature 3391.359 TREL10 2907.284 CRFVOL_M_agg30cm_AF_1km 2541.250 Maize_rainfed_low_baseline 2283.322 M13NDVIALT 2035.137 NCluster_19_AF_1km 2035.077 B13CHE3 1996.356 NMOD3avg 1727.890 BARL10 1424.586 EXMOD5avg 1326.962 NCluster_14_AF_1km 1112.412 SSI_NCluster_AF_1km 1092.409 eXtreme Gradient Boosting 1609 samples 377 predictors No pre-processing Resampling: Cross-Validated (3 fold, repeated 1 times) Summary of sample sizes: 1072, 1073, 1073 Resampling results across tuning parameters: eta max_depth nrounds RMSE Rsquared 0.3 2 50 7.119501 0.5088237 0.3 2 100 7.119395 0.5089251 0.3 2 150 7.119390 0.5089316 0.3 3 50 7.116428 0.5093983 0.3 3 100 7.116556 0.5093907 0.3 3 150 7.116556 0.5093907 0.3 4 50 7.115729 0.5095099 0.3 4 100 7.115732 0.5095105 0.3 4 150 7.115732 0.5095105 0.3 5 50 7.119014 0.5089949 0.3 5 100 7.119015 0.5089949 0.3 5 150 7.119015 0.5089949 0.3 6 50 7.115729 0.5095110 0.3 6 100 7.115729 0.5095110 0.3 6 150 7.115729 0.5095110 0.3 7 50 7.116524 0.5093957 0.3 7 100 7.116524 0.5093957 0.3 7 150 7.116524 0.5093957 0.3 8 50 7.118873 0.5090183 0.3 8 100 7.118873 0.5090183 0.3 8 150 7.118873 0.5090183 0.4 2 50 7.117714 0.5091430 0.4 2 100 7.117773 0.5091971 0.4 2 150 7.117776 0.5091982 0.4 3 50 7.119203 0.5089619 0.4 3 100 7.119215 0.5089617 0.4 3 150 7.119215 0.5089617 0.4 4 50 7.115222 0.5095529 0.4 4 100 7.115222 0.5095529 0.4 4 150 7.115222 0.5095529 0.4 5 50 7.115309 0.5095577 0.4 5 100 7.115309 0.5095577 0.4 5 150 7.115309 0.5095577 0.4 6 50 7.115225 0.5095557 0.4 6 100 7.115225 0.5095557 0.4 6 150 7.115225 0.5095557 0.4 7 50 7.115242 0.5095583 0.4 7 100 7.115242 0.5095583 0.4 7 150 7.115242 0.5095583 0.4 8 50 7.115224 0.5095517 0.4 8 100 7.115224 0.5095517 0.4 8 150 7.115224 0.5095517 0.5 2 50 7.127463 0.5075446 0.5 2 100 7.127439 0.5075561 0.5 2 150 7.127439 0.5075561 0.5 3 50 7.122742 0.5083652 0.5 3 100 7.122742 0.5083652 0.5 3 150 7.122742 0.5083652 0.5 4 50 7.115225 0.5095525 0.5 4 100 7.115225 0.5095525 0.5 4 150 7.115225 0.5095525 0.5 5 50 7.126601 0.5077009 0.5 5 100 7.126601 0.5077009 0.5 5 150 7.126601 0.5077009 0.5 6 50 7.115223 0.5095534 0.5 6 100 7.115223 0.5095534 0.5 6 150 7.115223 0.5095534 0.5 7 50 7.118141 0.5091388 0.5 7 100 7.118141 0.5091388 0.5 7 150 7.118141 0.5091388 0.5 8 50 7.115221 0.5095521 0.5 8 100 7.115221 0.5095521 0.5 8 150 7.115221 0.5095521 Tuning parameter 'gamma' was held constant at a value of 0 Tuning parameter 'colsample_bytree' was held constant at a value of 0.8 Tuning parameter 'min_child_weight' was held constant at a value of 1 RMSE was used to select the optimal model using the smallest value. The final values used for the model were nrounds = 50, max_depth = 8, eta = 0.5, gamma = 0, colsample_bytree = 0.8 and min_child_weight = 1. RMSE: 7.115 R2: 0.51 XGBoost variable importance: Feature Gain Cover Frequency 1: Maize_actualbaseline 0.559776229 0.074979634 0.051546392 2: yGapMaize 0.233751785 0.028629165 0.025773196 3: af_agg_30cm_PWP__M_1km 0.042300787 0.005734297 0.051546392 4: VBFMRG5 0.033307084 0.007818534 0.005154639 5: Al_M_agg30cm_AF_1km 0.027749563 0.011574392 0.015463918 6: GAEZ_LGP 0.024734196 0.007173161 0.005154639 7: N_M_agg30cm_AF_1km 0.021944801 0.050841630 0.030927835 8: BIO12ALT 0.016856114 0.019583364 0.015463918 9: ENAX_M_agg30cm_AF_1km 0.014300036 0.049868280 0.030927835 10: af_agg_30cm_TAWCpF23__M_1km 0.011352279 0.001274876 0.010309278 11: Ca_M_agg30cm_AF_1km 0.003666706 0.009225658 0.010309278 12: GAEZ_NPP 0.003314226 0.041430823 0.025773196 13: NCluster_1_AF_1km 0.002015622 0.033342503 0.025773196 14: M13RB3ALT 0.001454620 0.008188830 0.005154639 15: rElevIndex 0.001162061 0.008162380 0.005154639 Ensemble validation RMSE: 7.205 R2: 0.488 -------------------------------------- Variable: fPR Ranger result Call: ranger(formulaString.lst[[j]], data = dfs, importance = "impurity", write.forest = TRUE, mtry = t.mrfX$bestTune$mtry, num.trees = 500) Type: Regression Number of trees: 500 Sample size: 1609 Number of independent variables: 377 Mtry: 16 Target node size: 5 Variable importance mode: impurity OOB prediction error: 6.277497 R squared: 0.8964517 OOB RMSE: 2.505 Variable importance: [,1] Maize_intermed 3926.730 Maize_rainfed_low_baseline 3503.204 Maize_rainfed_intermed_baseline 2693.993 yFertilised_MaizeT2 2666.661 Maize_rainfed_high_baseline 2443.650 NMOD3avg 2153.018 Temperature 2058.690 Slopeclassc3 1912.595 TMNMOD3 1772.939 M13RB3A04 1449.315 B13CHE3 1389.378 Water_balance 1379.317 BIO12ALT 1323.633 Riclassc2 1314.384 GAEZ_LGP 1289.299 eXtreme Gradient Boosting 1609 samples 377 predictors No pre-processing Resampling: Cross-Validated (3 fold, repeated 1 times) Summary of sample sizes: 1072, 1073, 1073 Resampling results across tuning parameters: eta max_depth nrounds RMSE Rsquared 0.3 2 50 2.418718 0.9034400 0.3 2 100 2.418655 0.9034349 0.3 2 150 2.418655 0.9034348 0.3 3 50 2.418871 0.9034181 0.3 3 100 2.418884 0.9034170 0.3 3 150 2.418884 0.9034170 0.3 4 50 2.420240 0.9033095 0.3 4 100 2.420240 0.9033096 0.3 4 150 2.420240 0.9033096 0.3 5 50 2.418342 0.9034579 0.3 5 100 2.418342 0.9034579 0.3 5 150 2.418342 0.9034579 0.3 6 50 2.419404 0.9033760 0.3 6 100 2.419404 0.9033760 0.3 6 150 2.419404 0.9033760 0.3 7 50 2.419068 0.9034025 0.3 7 100 2.419068 0.9034025 0.3 7 150 2.419068 0.9034025 0.3 8 50 2.418525 0.9034448 0.3 8 100 2.418525 0.9034448 0.3 8 150 2.418525 0.9034448 0.4 2 50 2.418291 0.9034643 0.4 2 100 2.418344 0.9034578 0.4 2 150 2.418344 0.9034578 0.4 3 50 2.418342 0.9034579 0.4 3 100 2.418342 0.9034579 0.4 3 150 2.418342 0.9034579 0.4 4 50 2.419682 0.9033540 0.4 4 100 2.419682 0.9033540 0.4 4 150 2.419682 0.9033540 0.4 5 50 2.419647 0.9033567 0.4 5 100 2.419647 0.9033567 0.4 5 150 2.419647 0.9033568 0.4 6 50 2.419165 0.9033949 0.4 6 100 2.419165 0.9033949 0.4 6 150 2.419165 0.9033949 0.4 7 50 2.418402 0.9034540 0.4 7 100 2.418401 0.9034540 0.4 7 150 2.418401 0.9034540 0.4 8 50 2.419247 0.9033885 0.4 8 100 2.419247 0.9033885 0.4 8 150 2.419247 0.9033885 0.5 2 50 2.418701 0.9034317 0.5 2 100 2.418684 0.9034325 0.5 2 150 2.418684 0.9034325 0.5 3 50 2.418341 0.9034578 0.5 3 100 2.418341 0.9034578 0.5 3 150 2.418342 0.9034578 0.5 4 50 2.425197 0.9029107 0.5 4 100 2.425197 0.9029107 0.5 4 150 2.425197 0.9029107 0.5 5 50 2.418429 0.9034520 0.5 5 100 2.418429 0.9034520 0.5 5 150 2.418429 0.9034520 0.5 6 50 2.419998 0.9033289 0.5 6 100 2.419998 0.9033289 0.5 6 150 2.419998 0.9033289 0.5 7 50 2.420209 0.9033120 0.5 7 100 2.420209 0.9033120 0.5 7 150 2.420209 0.9033120 0.5 8 50 2.419566 0.9033632 0.5 8 100 2.419566 0.9033632 0.5 8 150 2.419566 0.9033632 Tuning parameter 'gamma' was held constant at a value of 0 Tuning parameter 'colsample_bytree' was held constant at a value of 0.8 Tuning parameter 'min_child_weight' was held constant at a value of 1 RMSE was used to select the optimal model using the smallest value. The final values used for the model were nrounds = 50, max_depth = 2, eta = 0.4, gamma = 0, colsample_bytree = 0.8 and min_child_weight = 1. RMSE: 2.418 R2: 0.903 XGBoost variable importance: Feature Gain Cover Frequency 1: Maize_intermed 0.594186986 0.040045048 0.029850746 2: NMOD3avg 0.183420638 0.023475756 0.022388060 3: af_agg_30cm_PWP__M_1km 0.055686377 0.006365147 0.037313433 4: Al_M_agg30cm_AF_1km 0.039110696 0.058294290 0.059701493 5: VBFMRG5 0.023668593 0.023550420 0.022388060 6: yFertilised_MaizeT2 0.020743346 0.020022524 0.014925373 7: af_agg_30cm_AWCpF23__M_1km 0.020192906 0.004504757 0.037313433 8: AAIavg_GYGA 0.011111694 0.003372346 0.014925373 9: CEC_M_agg30cm_AF_1km 0.010435935 0.012412969 0.014925373 10: P.B_M_agg30cm_AF_1km 0.009743959 0.009762380 0.007462687 11: Maize_rainfed_low_baseline 0.006786371 0.009924153 0.007462687 12: M13RB1A01 0.005838440 0.003241683 0.007462687 13: B14CHE3 0.003931088 0.010011262 0.007462687 14: EACKCL_M_agg30cm_AF_1km 0.003524324 0.019767420 0.014925373 15: Fcover 0.002966293 0.018118580 0.014925373 Ensemble validation RMSE: 2.459 R2: 0.9 -------------------------------------- Variable: fKR Ranger result Call: ranger(formulaString.lst[[j]], data = dfs, importance = "impurity", write.forest = TRUE, mtry = t.mrfX$bestTune$mtry, num.trees = 500) Type: Regression Number of trees: 500 Sample size: 1609 Number of independent variables: 377 Mtry: 12 Target node size: 5 Variable importance mode: impurity OOB prediction error: 23.30512 R squared: 0.7850955 OOB RMSE: 4.828 Variable importance: [,1] GAEZ_ratioP_PETan 4559.534 M43BSALT 4062.930 N_M_agg30cm_AF_1km 3835.904 M43WSALT 3712.870 NCluster_15_AF_1km 3011.164 BIO12ALT 2892.304 M13RB3A04 2740.815 Water_balance 2639.733 IMOD4avg 2420.402 GAEZ_ET 2317.942 GTDHYS3 2136.651 NCluster_4_AF_1km 2039.069 M13RB1A04 1966.811 Maize_actualbaseline 1928.389 NCluster_14_AF_1km 1914.043 eXtreme Gradient Boosting 1609 samples 377 predictors No pre-processing Resampling: Cross-Validated (3 fold, repeated 1 times) Summary of sample sizes: 1073, 1072, 1073 Resampling results across tuning parameters: eta max_depth nrounds RMSE Rsquared 0.3 2 50 4.842535 0.7807777 0.3 2 100 4.845074 0.7805750 0.3 2 150 4.845246 0.7805605 0.3 3 50 4.844796 0.7806083 0.3 3 100 4.845082 0.7805900 0.3 3 150 4.845082 0.7805900 0.3 4 50 4.845099 0.7805841 0.3 4 100 4.845100 0.7805840 0.3 4 150 4.845100 0.7805840 0.3 5 50 4.845094 0.7805852 0.3 5 100 4.845095 0.7805851 0.3 5 150 4.845095 0.7805851 0.3 6 50 4.845080 0.7805894 0.3 6 100 4.845084 0.7805891 0.3 6 150 4.845085 0.7805890 0.3 7 50 4.845095 0.7805842 0.3 7 100 4.845101 0.7805838 0.3 7 150 4.845100 0.7805838 0.3 8 50 4.845182 0.7805700 0.3 8 100 4.845182 0.7805699 0.3 8 150 4.845182 0.7805699 0.4 2 50 4.844865 0.7806214 0.4 2 100 4.845075 0.7805884 0.4 2 150 4.845088 0.7805874 0.4 3 50 4.846621 0.7804132 0.4 3 100 4.846638 0.7804118 0.4 3 150 4.846638 0.7804118 0.4 4 50 4.845095 0.7805851 0.4 4 100 4.845095 0.7805851 0.4 4 150 4.845095 0.7805851 0.4 5 50 4.845085 0.7805899 0.4 5 100 4.845085 0.7805899 0.4 5 150 4.845085 0.7805899 0.4 6 50 4.846890 0.7803868 0.4 6 100 4.846890 0.7803868 0.4 6 150 4.846890 0.7803868 0.4 7 50 4.845084 0.7805900 0.4 7 100 4.845084 0.7805900 0.4 7 150 4.845084 0.7805900 0.4 8 50 4.845082 0.7805896 0.4 8 100 4.845081 0.7805896 0.4 8 150 4.845082 0.7805896 0.5 2 50 4.845050 0.7805920 0.5 2 100 4.845084 0.7805900 0.5 2 150 4.845084 0.7805900 0.5 3 50 4.845212 0.7805655 0.5 3 100 4.845212 0.7805654 0.5 3 150 4.845212 0.7805655 0.5 4 50 4.845097 0.7805846 0.5 4 100 4.845097 0.7805846 0.5 4 150 4.845097 0.7805846 0.5 5 50 4.845367 0.7805460 0.5 5 100 4.845367 0.7805460 0.5 5 150 4.845367 0.7805460 0.5 6 50 4.845102 0.7805835 0.5 6 100 4.845102 0.7805835 0.5 6 150 4.845102 0.7805835 0.5 7 50 4.845090 0.7805861 0.5 7 100 4.845090 0.7805862 0.5 7 150 4.845092 0.7805861 0.5 8 50 4.845090 0.7805861 0.5 8 100 4.845090 0.7805861 0.5 8 150 4.845090 0.7805861 Tuning parameter 'gamma' was held constant at a value of 0 Tuning parameter 'colsample_bytree' was held constant at a value of 0.8 Tuning parameter 'min_child_weight' was held constant at a value of 1 RMSE was used to select the optimal model using the smallest value. The final values used for the model were nrounds = 50, max_depth = 2, eta = 0.3, gamma = 0, colsample_bytree = 0.8 and min_child_weight = 1. RMSE: 4.843 R2: 0.781 XGBoost variable importance: Feature Gain Cover Frequency 1: N_M_agg30cm_AF_1km 0.296677245 0.036318844 0.029850746 2: M43BSALT 0.202632526 0.020019534 0.014925373 3: af_agg_30cm_PWP__M_1km 0.097315765 0.010401697 0.037313433 4: BIO12ALT 0.085444563 0.014781359 0.022388060 5: Al_M_agg30cm_AF_1km 0.055009639 0.066764960 0.059701493 6: GPMIMERGALT 0.044346473 0.007552428 0.007462687 7: VBFMRG5 0.033509029 0.043479343 0.037313433 8: M13NDVIA08 0.033047990 0.005468356 0.007462687 9: GAEZ_ratioP_PETan 0.028531012 0.010009767 0.007462687 10: C03GLC5 0.027694785 0.004541411 0.007462687 11: af_BDRICM_T__M_1km 0.019221489 0.010544783 0.022388060 12: AAIavg_GYGA 0.015786832 0.011297537 0.059701493 13: CHIRPSA 0.013554600 0.021126892 0.022388060 14: Usgs_lithologyc3 0.010622129 0.020019534 0.014925373 15: NCluster_12_AF_1km 0.006627228 0.008199424 0.007462687 Ensemble validation RMSE: 4.869 R2: 0.781 -------------------------------------- Variable: fNRec Ranger result Call: ranger(formulaString.lst[[j]], data = dfs, importance = "impurity", write.forest = TRUE, mtry = t.mrfX$bestTune$mtry, num.trees = 500) Type: Regression Number of trees: 500 Sample size: 342 Number of independent variables: 377 Mtry: 6 Target node size: 5 Variable importance mode: impurity OOB prediction error: 186.1742 R squared: 0.5556476 OOB RMSE: 13.645 Variable importance: [,1] NMSD3avg 1600.246 MY2LSTNALT_200207_201609 1396.236 PCHE3avg 1287.760 CHIRPSA 1266.126 fPR_MaizeT2 1249.811 MAXENV3 1174.367 M43WNALT 1157.679 Water_balance 1141.194 M43BNALT 1138.697 Na_M_agg30cm_AF_1km 1120.171 SLTPPT_M_agg30cm_AF_1km 1102.071 af_agg_30cm_TAWCpF23mm__M_1km 1087.983 ECN_M_agg30cm_AF_1km 1081.573 af_BDRICM_T__M_1km 1071.111 PET 1065.389 eXtreme Gradient Boosting 342 samples 377 predictors No pre-processing Resampling: Cross-Validated (3 fold, repeated 1 times) Summary of sample sizes: 228, 227, 229 Resampling results across tuning parameters: eta max_depth nrounds RMSE Rsquared 0.3 2 50 14.61595 0.5101834 0.3 2 100 14.96003 0.4971105 0.3 2 150 15.01199 0.4950768 0.3 3 50 14.83675 0.5008060 0.3 3 100 14.89941 0.4987678 0.3 3 150 14.90193 0.4986854 0.3 4 50 14.95704 0.4962476 0.3 4 100 14.96271 0.4960394 0.3 4 150 14.96271 0.4960394 0.3 5 50 14.85766 0.5026871 0.3 5 100 14.85938 0.5026211 0.3 5 150 14.85938 0.5026211 0.3 6 50 14.81093 0.5060536 0.3 6 100 14.81181 0.5060258 0.3 6 150 14.81181 0.5060258 0.3 7 50 14.65921 0.5128894 0.3 7 100 14.65995 0.5128568 0.3 7 150 14.65995 0.5128568 0.3 8 50 14.77451 0.5087376 0.3 8 100 14.77499 0.5087160 0.3 8 150 14.77499 0.5087160 0.4 2 50 14.74583 0.5070481 0.4 2 100 14.88873 0.5019755 0.4 2 150 14.90364 0.5013902 0.4 3 50 14.83723 0.5026487 0.4 3 100 14.84717 0.5023007 0.4 3 150 14.84726 0.5022980 0.4 4 50 15.13346 0.4897437 0.4 4 100 15.13379 0.4897412 0.4 4 150 15.13379 0.4897412 0.4 5 50 15.12407 0.4916515 0.4 5 100 15.12416 0.4916481 0.4 5 150 15.12416 0.4916481 0.4 6 50 15.07674 0.4899195 0.4 6 100 15.07675 0.4899189 0.4 6 150 15.07675 0.4899189 0.4 7 50 14.66355 0.5138190 0.4 7 100 14.66355 0.5138191 0.4 7 150 14.66355 0.5138191 0.4 8 50 14.97624 0.4942444 0.4 8 100 14.97625 0.4942438 0.4 8 150 14.97625 0.4942438 0.5 2 50 14.96619 0.4946096 0.5 2 100 15.03856 0.4921333 0.5 2 150 15.04263 0.4919765 0.5 3 50 14.63154 0.5166167 0.5 3 100 14.63473 0.5164780 0.5 3 150 14.63473 0.5164780 0.5 4 50 14.97300 0.4969722 0.5 4 100 14.97308 0.4969688 0.5 4 150 14.97308 0.4969688 0.5 5 50 14.75333 0.5083300 0.5 5 100 14.75333 0.5083300 0.5 5 150 14.75333 0.5083300 0.5 6 50 14.79477 0.5073470 0.5 6 100 14.79477 0.5073470 0.5 6 150 14.79477 0.5073470 0.5 7 50 14.76019 0.5070665 0.5 7 100 14.76019 0.5070665 0.5 7 150 14.76019 0.5070665 0.5 8 50 15.03415 0.4976923 0.5 8 100 15.03414 0.4976924 0.5 8 150 15.03414 0.4976924 Tuning parameter 'gamma' was held constant at a value of 0 Tuning parameter 'colsample_bytree' was held constant at a value of 0.8 Tuning parameter 'min_child_weight' was held constant at a value of 1 RMSE was used to select the optimal model using the smallest value. The final values used for the model were nrounds = 50, max_depth = 2, eta = 0.3, gamma = 0, colsample_bytree = 0.8 and min_child_weight = 1. RMSE: 14.616 R2: 0.51 XGBoost variable importance: Feature Gain Cover Frequency 1: NMSD3avg 0.18438625 0.020042781 0.014705882 2: EXBX_M_agg30cm_AF_1km 0.11140195 0.102147859 0.088235294 3: af_BDRICM_T__M_1km 0.06790195 0.047264629 0.044117647 4: M17GPPALTfill 0.06294146 0.037155332 0.029411765 5: N_M_agg30cm_AF_1km 0.05464527 0.026459988 0.022058824 6: af_agg_30cm_TAWCpF23mm__M_1km 0.05269553 0.010226507 0.014705882 7: Mg_M_agg30cm_AF_1km 0.03956883 0.015969760 0.014705882 8: M13NDVIA04 0.03346730 0.039440912 0.036764706 9: GAEZ_NPP 0.03075145 0.004014417 0.007352941 10: M43WNALT 0.02978413 0.002783720 0.007352941 11: NIRL00 0.02630397 0.010021391 0.007352941 12: Na_M_agg30cm_AF_1km 0.02447785 0.009347438 0.007352941 13: MAXENV3 0.02317990 0.021537199 0.022058824 14: AAIavg_GYGA 0.02168600 0.001553023 0.036764706 15: Fe_M_agg30cm_AF_1km 0.02161478 0.034488821 0.029411765 Ensemble validation RMSE: 13.828 R2: 0.545 -------------------------------------- Variable: fPRec Ranger result Call: ranger(formulaString.lst[[j]], data = dfs, importance = "impurity", write.forest = TRUE, mtry = t.mrfX$bestTune$mtry, num.trees = 500) Type: Regression Number of trees: 500 Sample size: 329 Number of independent variables: 377 Mtry: 4 Target node size: 5 Variable importance mode: impurity OOB prediction error: 236.0367 R squared: 0.03078028 OOB RMSE: 15.363 Variable importance: [,1] TMSD3avg 570.4339 M13RB3A01 551.1673 Wdvi 471.3502 Cu_M_agg30cm_AF_1km 454.5660 M43WNALT 446.4449 MY2LSTNALT_200207_201609 439.3799 NCluster_1_AF_1km 435.5829 B02CHE3 432.7723 M17GPPALTfill 429.7262 Lai_avg 417.2153 MMOD4avg 411.6490 M13RB1A01 399.7160 Water_balance 385.3072 C03GLC5 379.0770 M13RB1ALT 374.7210 eXtreme Gradient Boosting 329 samples 377 predictors No pre-processing Resampling: Cross-Validated (3 fold, repeated 1 times) Summary of sample sizes: 219, 219, 220 Resampling results across tuning parameters: eta max_depth nrounds RMSE Rsquared 0.3 2 50 14.60986 0.2024870 0.3 2 100 14.65026 0.2068839 0.3 2 150 14.65277 0.2076786 0.3 3 50 14.51787 0.2156386 0.3 3 100 14.52400 0.2162076 0.3 3 150 14.52400 0.2162329 0.3 4 50 14.35240 0.2248929 0.3 4 100 14.35349 0.2249945 0.3 4 150 14.35349 0.2249945 0.3 5 50 14.52500 0.2187562 0.3 5 100 14.52527 0.2187841 0.3 5 150 14.52527 0.2187841 0.3 6 50 14.51712 0.2193483 0.3 6 100 14.51731 0.2193574 0.3 6 150 14.51731 0.2193574 0.3 7 50 14.57930 0.2135989 0.3 7 100 14.57946 0.2135991 0.3 7 150 14.57946 0.2135991 0.3 8 50 14.62636 0.2113013 0.3 8 100 14.62654 0.2113009 0.3 8 150 14.62654 0.2113009 0.4 2 50 14.58692 0.2113074 0.4 2 100 14.60341 0.2137068 0.4 2 150 14.60260 0.2139888 0.4 3 50 14.63479 0.2132926 0.4 3 100 14.63553 0.2135190 0.4 3 150 14.63553 0.2135190 0.4 4 50 14.67953 0.2140596 0.4 4 100 14.67958 0.2140690 0.4 4 150 14.67958 0.2140690 0.4 5 50 14.62106 0.2161085 0.4 5 100 14.62106 0.2161085 0.4 5 150 14.62106 0.2161085 0.4 6 50 14.65291 0.2156541 0.4 6 100 14.65291 0.2156541 0.4 6 150 14.65291 0.2156541 0.4 7 50 14.65821 0.2118928 0.4 7 100 14.65821 0.2118928 0.4 7 150 14.65821 0.2118928 0.4 8 50 14.46459 0.2189074 0.4 8 100 14.46459 0.2189074 0.4 8 150 14.46459 0.2189074 0.5 2 50 14.79480 0.2079629 0.5 2 100 14.79636 0.2088305 0.5 2 150 14.79621 0.2088753 0.5 3 50 14.97230 0.1900806 0.5 3 100 14.97230 0.1901189 0.5 3 150 14.97230 0.1901189 0.5 4 50 14.44378 0.2169671 0.5 4 100 14.44377 0.2169673 0.5 4 150 14.44377 0.2169673 0.5 5 50 14.80470 0.2101713 0.5 5 100 14.80470 0.2101712 0.5 5 150 14.80470 0.2101712 0.5 6 50 14.78095 0.2088845 0.5 6 100 14.78095 0.2088845 0.5 6 150 14.78095 0.2088845 0.5 7 50 14.66493 0.2150284 0.5 7 100 14.66493 0.2150283 0.5 7 150 14.66493 0.2150283 0.5 8 50 14.81479 0.2072304 0.5 8 100 14.81479 0.2072303 0.5 8 150 14.81479 0.2072303 Tuning parameter 'gamma' was held constant at a value of 0 Tuning parameter 'colsample_bytree' was held constant at a value of 0.8 Tuning parameter 'min_child_weight' was held constant at a value of 1 RMSE was used to select the optimal model using the smallest value. The final values used for the model were nrounds = 50, max_depth = 4, eta = 0.3, gamma = 0, colsample_bytree = 0.8 and min_child_weight = 1. RMSE: 14.352 R2: 0.225 XGBoost variable importance: Feature Gain Cover Frequency 1: NCluster_4_AF_1km 0.27918328 0.011658594 0.006976744 2: NMOD3avg 0.10020924 0.008864257 0.006976744 3: PCHE3avg 0.07503620 0.026654868 0.018604651 4: Lai_avg 0.06772984 0.004564083 0.006976744 5: MY2LSTNALT_200207_201609 0.05763106 0.008600348 0.006976744 6: af_agg_ERZD_TAWCpF23mm__M_1km 0.05416562 0.008848733 0.034883721 7: C04GLC5 0.04113628 0.027757079 0.013953488 8: Mg_M_agg30cm_AF_1km 0.04088511 0.014794461 0.016279070 9: SNDPPT_M_agg30cm_AF_1km 0.03627265 0.022634128 0.013953488 10: Water_balance 0.03427363 0.005107427 0.002325581 11: B13CHE3 0.02076473 0.011006582 0.013953488 12: Fe_M_agg30cm_AF_1km 0.01554035 0.007125559 0.009302326 13: M17GPPALTfill 0.01461690 0.041030179 0.020930233 14: GPMIMERGALT 0.01290313 0.007653378 0.013953488 15: B04CHE3 0.01074575 0.012326130 0.011627907 Ensemble validation RMSE: 15.493 R2: 0.078 -------------------------------------- Variable: fKRec Ranger result Call: ranger(formulaString.lst[[j]], data = dfs, importance = "impurity", write.forest = TRUE, mtry = t.mrfX$bestTune$mtry, num.trees = 500) Type: Regression Number of trees: 500 Sample size: 321 Number of independent variables: 377 Mtry: 4 Target node size: 5 Variable importance mode: impurity OOB prediction error: 79.23831 R squared: 0.442206 OOB RMSE: 8.902 Variable importance: [,1] BIO12ALT 382.1494 fNR_MaizeT2 357.5438 NCluster_1_AF_1km 351.1373 M17NPPALTfill 320.2978 M17GPPALTfill 313.2447 Water_balance 293.6336 Na_M_agg30cm_AF_1km 283.7402 M13RB1ALT 280.5583 C03GLC5 275.5281 PRSCHE3 275.3184 Fapar 274.5340 M13RB3A01 269.0853 ESMOD5avg 267.8652 Fcover 263.9524 M13RB1A04 254.7153 eXtreme Gradient Boosting 321 samples 377 predictors No pre-processing Resampling: Cross-Validated (3 fold, repeated 1 times) Summary of sample sizes: 214, 214, 214 Resampling results across tuning parameters: eta max_depth nrounds RMSE Rsquared 0.3 2 50 9.246048 0.4226762 0.3 2 100 9.342895 0.4198199 0.3 2 150 9.359781 0.4189601 0.3 3 50 9.172986 0.4316180 0.3 3 100 9.201086 0.4299720 0.3 3 150 9.201763 0.4299372 0.3 4 50 9.147825 0.4368623 0.3 4 100 9.150547 0.4367794 0.3 4 150 9.150547 0.4367794 0.3 5 50 9.046238 0.4458910 0.3 5 100 9.046936 0.4458668 0.3 5 150 9.046936 0.4458668 0.3 6 50 9.148507 0.4334563 0.3 6 100 9.148998 0.4334320 0.3 6 150 9.148998 0.4334320 0.3 7 50 9.119063 0.4388055 0.3 7 100 9.119345 0.4387970 0.3 7 150 9.119345 0.4387970 0.3 8 50 8.990025 0.4499688 0.3 8 100 8.990097 0.4499702 0.3 8 150 8.990097 0.4499702 0.4 2 50 9.164750 0.4327252 0.4 2 100 9.223775 0.4304119 0.4 2 150 9.228109 0.4302396 0.4 3 50 9.295671 0.4200692 0.4 3 100 9.300062 0.4199356 0.4 3 150 9.300071 0.4199351 0.4 4 50 9.065070 0.4435587 0.4 4 100 9.065866 0.4435131 0.4 4 150 9.065866 0.4435131 0.4 5 50 9.155705 0.4371725 0.4 5 100 9.155712 0.4371720 0.4 5 150 9.155712 0.4371720 0.4 6 50 9.114859 0.4407591 0.4 6 100 9.114859 0.4407591 0.4 6 150 9.114859 0.4407591 0.4 7 50 9.027455 0.4492480 0.4 7 100 9.027455 0.4492479 0.4 7 150 9.027455 0.4492479 0.4 8 50 8.985514 0.4501704 0.4 8 100 8.985514 0.4501704 0.4 8 150 8.985514 0.4501704 0.5 2 50 9.280995 0.4199565 0.5 2 100 9.308783 0.4186051 0.5 2 150 9.310944 0.4184613 0.5 3 50 9.089595 0.4398378 0.5 3 100 9.090574 0.4397838 0.5 3 150 9.090574 0.4397838 0.5 4 50 9.211803 0.4256328 0.5 4 100 9.211816 0.4256318 0.5 4 150 9.211816 0.4256318 0.5 5 50 9.241837 0.4307352 0.5 5 100 9.241837 0.4307352 0.5 5 150 9.241837 0.4307352 0.5 6 50 9.370408 0.4169990 0.5 6 100 9.370407 0.4169990 0.5 6 150 9.370407 0.4169990 0.5 7 50 8.874402 0.4638143 0.5 7 100 8.874402 0.4638143 0.5 7 150 8.874402 0.4638143 0.5 8 50 9.036373 0.4501752 0.5 8 100 9.036373 0.4501752 0.5 8 150 9.036373 0.4501752 Tuning parameter 'gamma' was held constant at a value of 0 Tuning parameter 'colsample_bytree' was held constant at a value of 0.8 Tuning parameter 'min_child_weight' was held constant at a value of 1 RMSE was used to select the optimal model using the smallest value. The final values used for the model were nrounds = 150, max_depth = 7, eta = 0.5, gamma = 0, colsample_bytree = 0.8 and min_child_weight = 1. RMSE: 8.874 R2: 0.464 XGBoost variable importance: Feature Gain Cover Frequency 1: M13RB3A04 0.24424635 0.004143914 0.002252252 2: PCHE3avg 0.11965500 0.016975847 0.013513514 3: yFertilised_MaizeT2 0.09290477 0.005267031 0.004504505 4: M17GPPALTfill 0.08731489 0.007616540 0.004504505 5: ECN_M_agg30cm_AF_1km 0.07037200 0.014819979 0.015765766 6: RANENV3 0.04606269 0.008584744 0.011261261 7: M13RB3A08 0.03989240 0.010818068 0.006756757 8: B14CHE3 0.03324920 0.006841976 0.011261261 9: M17NPPALTfill 0.03090672 0.008507287 0.006756757 10: Fapar 0.03066925 0.002297871 0.002252252 11: Cu_M_agg30cm_AF_1km 0.02999046 0.011024618 0.011261261 12: Na_M_agg30cm_AF_1km 0.02197387 0.011644269 0.020270270 13: C04GLC5 0.01746421 0.005357396 0.006756757 14: BIO12ALT 0.01203713 0.006041594 0.006756757 15: EXBX_M_agg30cm_AF_1km 0.01082965 0.010559880 0.011261261 Ensemble validation RMSE: 8.713 R2: 0.465 --------------------------------------