Prediction of agricultural production in Colombia using regression models : a comparative analysis.

dc.contributor.advisorGalpin, Ixent
dc.creatorMancilla Medina, Karol Nicole
dc.date.accessioned2026-01-14T14:58:19Z
dc.date.created2025-11-27
dc.description.abstractEste estudio presenta una metodología basada en datos para la predicción del rendimiento agrícola en Colombia, utilizando conjuntos de datos semestrales segmentados por cultivo y municipio entre 2006 y 2023. Siguiendo la metodología CRISP-DM, el proyecto incluye las fases de comprensión del dominio, preparación de datos, modelado, evaluación y despliegue. Se implementaron dos estrategias de modelado: una ruta manual, que permitió un control detallado de la ingeniería de características y del preprocesamiento de los datos, y una ruta automatizada mediante el uso del framework PyCaret, la cual facilitó la comparación de algoritmos y la optimización de hiperparámetros. Se evaluaron diversas técnicas de regresión, incluyendo XGBoost, LightGBM, Random Forest, Gradient Boosting y Support Vector Regression. La evaluación de los modelos se basó en las métricas MAE, RMSE y R². Los resultados muestran que el modelado automatizado con PyCaret logró un desempeño predictivo superior en comparación con los enfoques manuales. Este trabajo ofrece un marco reproducible para el despliegue de herramientas predictivas en la planificación agrícola y resalta oportunidades de investigación futura mediante la integración de datos externos y métodos de aprendizaje profundo.
dc.description.abstractenglishThis study introduces a data-driven methodology for forecasting agricultural yields in Colombia, based on semiannual datasets segmented by crop and municipality between 2006 and 2023. Adopting the CRISP-DM process model, the project encompasses domain exploration, data preparation, modeling, evaluation and deployment. Two modeling paths were pursued: a manual pipeline that enabled precise handling of feature engineering and preprocessing, and an automated pipeline using the PyCaret framework to facilitate algorithm benchmarking and hyperparameter optimization. Several regression techniques were applied, including XGBoost, LightGBM, Random Forest, Gradient Boosting and Support Vector Regression. Evaluation relied on conventional metrics such as MAE, RMSE and R². The outcomes reveal that automated regression modeling with PyCaret achieves higher predictive performance than manual modeling approaches. This work establishes a reproducible framework for deploying predictive tools in agricultural planning and underscores opportunities for future research through the integration of external data sources and deep learning methods.
dc.format.extent11 páginas
dc.format.mimetypeapplication/pdf
dc.identifier.urihttps://hdl.handle.net/20.500.12010/38766
dc.language.isoen_US
dc.relation.referencesS. K. Gupta and S. Malik, “Application of predictive analytics in agriculture,” Technoarete Transactions on Intelligent Data Mining and Knowledge Discovery, 2020.
dc.relation.referencesM. Javaid, A. Haleem, R. P. Singh, and R. Suman, “En- hancing smart farming through the applications of agri- culture 4.0 technologies,” International Journal of Intelligent Networks, vol. 3, pp. 150–164, 2022.
dc.relation.referencesUnidad de Planificación Rural Agropecuaria (UPRA), “Evaluaciones agropecuarias municipales (EVA),” 2022.
dc.relation.referencesGobierno de Colombia, “Evaluaciones agropecuarias municipales – EVA (2007–2022),” 2022.
dc.relation.referencesMinisterio de Agricultura y Desarrollo Rural, “Resolución 299 de 2019: Por la cual se delegan unas funciones en el director general de la UPRA,” 2019.
dc.relation.referencesMinisterio de Agricultura y Desarrollo Rural, “Agronet – Red de información y comunicación del sector agropecuario colombiano.”
dc.relation.referencesP. Chapman, J. Clinton, R. Kerber, T. Khabaza, T. Reinartz, C. Shearer, and R. Wirth, “CRISP-DM 1.0 – Step-by-step data mining guide,” 2000.
dc.relation.referencesG. Ke et al., “LightGBM: A highly efficient gradient boosting decision tree,” Advances in Neural Information Processing Systems, vol. 30, 2017.
dc.relation.referencesT. Chen and C. Guestrin, “XGBoost: A scalable tree boosting system,” Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–794, 2016.
dc.relation.referencesA. I. Udeh, O. J. Imarhiagbe, and E. J. Omietimi, “Modeling of total dissolved solids (TDS) and sodium absorption ratio (SAR) in the Edwards-Trinity Plateau and Ogallala aquifers in the Midland-Odessa region using random forest regression and extreme gradient boosting,” Journal of Geoscience and Environment Protection, vol. 12, pp. 218–241, 2024.
dc.relation.referencesF. Pedregosa et al., “Scikit-learn: Machine learning in Python,” Journal of Machine Learning Research, vol. 12, pp. 2825–2830, 2011.
dc.relation.referencesM. Ali, “PyCaret: An open-source, low-code machine learning library in Python,” 2023.
dc.relation.referencesS. Condran, M. Bewong, M. Z. Islam, L. Maphosa, and L. Zheng, “Machine learning in precision agriculture: A survey on trends, applications and evaluations over two decades,” IEEE Access, vol. 10, pp. 73786–73803, 2022.
dc.relation.referencesV. Meshram, K. Patil, V. Meshram, D. Hanchate, and S. D. Ramkteke, “Machine learning in agriculture domain: A state-of-art survey,” Artificial Intelligence in the Life Sciences, vol. 1, p. 100010, 2021.
dc.relation.referencesH. Ait Issad, R. Aoudjit, and J. J. P. C. Rodrigues, “A comprehensive review of data mining techniques in smart agriculture,” Engineering in Agriculture, Environment and Food, vol. 12, no. 4, pp. 511–525, 2019.
dc.relation.referencesR. Aworka et al., “Agricultural decision system based on advanced machine learning models for yield prediction: Case of East African countries,” Smart Agricultural Technology, vol. 2, p. 100048, 2022.
dc.relation.referencesH. A. Vidya and N. M. S. Murthy, “Artificial intelligence in agriculture and healthcare: A comprehensive study,” European Chemical Bulletin, vol. 12, no. 8, pp. 7745–7754, 2023.
dc.relation.referencesA. Abdullayeva, “Impact of artificial intelligence on agricultural, healthcare and logistics industries,” Annals of Spiru Haret University. Economic Series, vol. 19, no. 2, pp. 167–175, 2019.
dc.relation.referencesM. Amini and A. Rahmani, “Agricultural databases evaluation with machine learning procedure,” SSRN, 2023.
dc.relation.referencesA. P. Singh, P. Sahu, A. Chug, and D. Singh, “A systematic literature review of machine learning techniques deployed in agriculture: A case study of banana crop,” IEEE Access, vol. 10, pp. 87333–87360, 2022.
dc.relation.referencesDepartamento Administrativo Nacional de Estadística (DANE), “Estadísticas agropecuarias. Serie histórica 2007–2018,” https://www.dane.gov.co/, 2007–2018.
dc.relation.referencesUnidad de Planificación Rural Agropecuaria (UPRA), “Base de datos de rendimiento agrícola 2019–2023,” https://www.upra.gov.co/, 2019–2023.
dc.relation.referencesScikit-learn Developers, “sklearn.model_selection.TimeSeriesSplit,” https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.TimeSeriesSplit.html, 2024.
dc.subjectPrediccion de rendimiento agrícola
dc.subjectEvaluaciones Agropecuarias Municipales (EVA)
dc.subjectCRISP-DM
dc.subjectSeries temporales
dc.subjectAprendizaje automático
dc.subjectRegresión
dc.subjectSostenibilidad agropecuaria
dc.subject.keywordAgricultural yield prediction
dc.subject.keywordEvaluaciones Agropecuarias Municipales (EVA)
dc.subject.keywordTime series
dc.subject.keywordMachine learning
dc.subject.keywordAgricultural sustainability
dc.subject.keywordRegression
dc.subject.lembAgricultura - Análisis de datos
dc.subject.lembAprendizaje automático - Aplicaciones en agricultura
dc.subject.lembPredicción estadística
dc.titlePrediction of agricultural production in Colombia using regression models : a comparative analysis.
dc.type.coarhttp://purl.org/coar/resource_type/c_bdcc

Archivos

Bloque original

Mostrando 1 - 1 de 1
Cargando...
Miniatura
Nombre:
PREDICTION_OF_AGRICULTURAL_PRODUCTION_IN_COLOMBIA_USING_REGRESSION_MODELS.pdf
Tamaño:
900.35 KB
Formato:
Adobe Portable Document Format

Bloque de licencias

Mostrando 1 - 2 de 2
Cargando...
Miniatura
Nombre:
license.txt
Tamaño:
3.28 KB
Formato:
Item-specific license agreed upon to submission
Descripción:
Cargando...
Miniatura
Nombre:
FOR-EFE-GDB-008_AUTORIZACION_DE_PUBLICACION_DE_TESIS_O_TRABAJO_DE_GRADO_DE_FORMA_CONFIDENCIAL_Mancilla_IG.pdf
Tamaño:
102.69 KB
Formato:
Adobe Portable Document Format
Descripción:
Carta de autorización