Multiple linear regression and Random Forest model to estimate soil bulk density in mountainous regions

Authors

  • Waldir Carvalho Junior Embrapa Solos, Rua Jardim Botânico, no 1.024, Jardim Botânico, CEP 22460-000 Rio de Janeiro, RJ.
  • Braz Calderano Filho Embrapa Solos, Rua Jardim Botânico, no 1.024, Jardim Botânico, CEP 22460-000 Rio de Janeiro, RJ.
  • Cesar da Silva Chagas Embrapa Solos, Rua Jardim Botânico, no 1.024, Jardim Botânico, CEP 22460-000 Rio de Janeiro, RJ.
  • Silvio Barge Bhering Embrapa Solos, Rua Jardim Botânico, no 1.024, Jardim Botânico, CEP 22460-000 Rio de Janeiro, RJ.
  • Nilson Rendeiro Pereira Embrapa Solos, Rua Jardim Botânico, no 1.024, Jardim Botânico, CEP 22460-000 Rio de Janeiro, RJ.
  • Helena Saraiva Koenow Pinheiro Universidade Federal Rural do Rio de Janeiro, Departamento de Solos, BR-465, Km 47, CEP 23890-000 Seropédica, RJ.

DOI:

https://doi.org/10.1590/S1678-3921.pab2016.v51.22426

Keywords:

carbon stock, pedotransfer functions, data-driven models, stepwise

Abstract

The objective of this work was the development of models with different sets of data for estimating soil bulk density in tropical mountainous regions, from soil attributes commonly found in the analyses of soil profiles described in regional surveys. The complete dataset is composed of 163 samples and it was divided into six groups, of which three groups have 73 samples and the maximum of 32 covariables, and three have 163 samples and the maximum of 18 covariables. The linear regression (RLM) and randomForest (RF) models were tested. The lowest uncertainty between the models was achieved by RLM2, with R2 of 0.56, 13 covariables, and 73 samples. Considering the groups with 163 samples, the best models were the RFs with mean R2 of 0.48. The root mean squared error ranged between 0.09 and 0.14. The most important covariables in the RF model were: organic carbon, hydrogen, fine and coarse sand, base saturation, and cation exchange capacity. By the stepwise backward regression, the main covariables were: silt and clay relation; fine and coarse sand; organic carbon; base saturation; and potassium.

Author Biographies

Waldir Carvalho Junior, Embrapa Solos, Rua Jardim Botânico, no 1.024, Jardim Botânico, CEP 22460-000 Rio de Janeiro, RJ.

http://lattes.cnpq.br/7992394393174495

Braz Calderano Filho, Embrapa Solos, Rua Jardim Botânico, no 1.024, Jardim Botânico, CEP 22460-000 Rio de Janeiro, RJ.

http://lattes.cnpq.br/1022003330179698

Cesar da Silva Chagas, Embrapa Solos, Rua Jardim Botânico, no 1.024, Jardim Botânico, CEP 22460-000 Rio de Janeiro, RJ.

http://lattes.cnpq.br/2023294299618632

Silvio Barge Bhering, Embrapa Solos, Rua Jardim Botânico, no 1.024, Jardim Botânico, CEP 22460-000 Rio de Janeiro, RJ.

http://lattes.cnpq.br/7591583531224450

Nilson Rendeiro Pereira, Embrapa Solos, Rua Jardim Botânico, no 1.024, Jardim Botânico, CEP 22460-000 Rio de Janeiro, RJ.

http://lattes.cnpq.br/7827914057878445

Helena Saraiva Koenow Pinheiro, Universidade Federal Rural do Rio de Janeiro, Departamento de Solos, BR-465, Km 47, CEP 23890-000 Seropédica, RJ.

http://lattes.cnpq.br/6947091664236298

Published

2016-10-17

How to Cite

Carvalho Junior, W., Calderano Filho, B., Chagas, C. da S., Bhering, S. B., Pereira, N. R., & Pinheiro, H. S. K. (2016). Multiple linear regression and Random Forest model to estimate soil bulk density in mountainous regions. Pesquisa Agropecuaria Brasileira, 51(9), 1428–1437. https://doi.org/10.1590/S1678-3921.pab2016.v51.22426