Soil class prediction by data mining in an area of the sedimentary São Francisco basin

Authors

  • Laura Milani da Silva Dias Universidade Estadual de Campinas, Rua Pandiá Calógeras, no 51, Cidade Universitária, CEP 13083-870 Campinas, SP.
  • Ricardo Marques Coelho Instituto Agronômico, Avenida Barão de Itapura, no 1.481, Jardim Guanabara, CEP 13012-970 Campinas, SP.
  • Gustavo Souza Valladares Universidade Federal do Piauí, Campus Universitário Ministro Petrônio Portella, Ininga, CEP 64049-550 Teresina, PI.
  • Ana Carolina Cunha de Assis Instituto Agronômico, Avenida Barão de Itapura, no 1.481, Jardim Guanabara, CEP 13012-970 Campinas, SP.
  • Edilene Pereira Ferreira Instituto Agronômico, Avenida Barão de Itapura, no 1.481, Jardim Guanabara, CEP 13012-970 Campinas, SP.
  • Rafael Cipriano da Silva Universidade de São Paulo, Escola Superior de Agricultura Luiz de Queiroz, Avenida Pádua Dias, no 11, Vila Independência, CEP 13418-260 Piracicaba, SP.

DOI:

https://doi.org/10.1590/S1678-3921.pab2016.v51.22491

Keywords:

soil map accuracy, classification algorithms, digital soil map, predictive variables of the terrain

Abstract

The objective of this work was to evaluate different strategies for the prediction of soil class distribution on digital soil maps of areas without reference data, in the sedimentary basin of San Francisco, in the north of the state of Minas Gerais, Brazil. The strategies included: taxonomic generalization, training by field observations, training set expansion, and the use of different data mining algorithms. Four matrices were developed, differentiated by the volume of data for machine learning and by soil taxonomic levels to be predicted. The performance of the machine learning algorithms – Random Forest, J48, and MLP –, associated with discretization, class balancing, variable selection, and expansion of the training set was evaluated. Class balancing, variable discretization by equal frequencies, and the Random Forest algorithm showed the best performances. The representativeness extension of field observations, that assumes a larger training area, brought no predictive gain. Soil taxonomic generalization to the suborder level reduces the fragmentation of mapped polygons and improves the accuracy of digital soil maps. When generated by training on in situ soil observations at the mapping area, digital soil maps are as accurate as those trained on preexistent maps.

Author Biographies

Laura Milani da Silva Dias, Universidade Estadual de Campinas, Rua Pandiá Calógeras, no 51, Cidade Universitária, CEP 13083-870 Campinas, SP.

http://lattes.cnpq.br/7411365793554118

Ricardo Marques Coelho, Instituto Agronômico, Avenida Barão de Itapura, no 1.481, Jardim Guanabara, CEP 13012-970 Campinas, SP.

http://lattes.cnpq.br/1190769214913092

Gustavo Souza Valladares, Universidade Federal do Piauí, Campus Universitário Ministro Petrônio Portella, Ininga, CEP 64049-550 Teresina, PI.

http://lattes.cnpq.br/7710601501267719

Ana Carolina Cunha de Assis, Instituto Agronômico, Avenida Barão de Itapura, no 1.481, Jardim Guanabara, CEP 13012-970 Campinas, SP.

http://lattes.cnpq.br/7000004568898795

Edilene Pereira Ferreira, Instituto Agronômico, Avenida Barão de Itapura, no 1.481, Jardim Guanabara, CEP 13012-970 Campinas, SP.

http://lattes.cnpq.br/8581279312251165

Published

2016-10-17

How to Cite

Dias, L. M. da S., Coelho, R. M., Valladares, G. S., Assis, A. C. C. de, Ferreira, E. P., & Silva, R. C. da. (2016). Soil class prediction by data mining in an area of the sedimentary São Francisco basin. Pesquisa Agropecuaria Brasileira, 51(9), 1396–1404. https://doi.org/10.1590/S1678-3921.pab2016.v51.22491