Aplicación de procesamiento del lenguaje natural (PLN) para la predicción del impacto de noticias colombianas en el índice COLCAP (Bolsa de Colombia)

Rico Gaviria, David

Aplicación de procesamiento del lenguaje natural (PLN) para la predicción del impacto de noticias colombianas en el índice COLCAP (Bolsa de Colombia)

dc.contributor.advisor	Romero Gelvez, Jorge Ivan
dc.creator	Rico Gaviria, David
dc.date.accessioned	2025-06-24T14:00:00Z
dc.date.available	2025-06-24T14:00:00Z
dc.date.created	2025-06-10
dc.description.abstract	Este estudio analiza cómo el lenguaje utilizado en noticias económicas puede anticipar las variaciones diarias del índice COLCAP, principal referente bursátil de Colombia. A partir de investigaciones previas en lingüística financiera y análisis de sentimientos, se propone un enfoque basado en aprendizaje profundo que integra representaciones semánticas de noticias con indicadores financieros tradicionales. Se emplea un corpus curado de noticias económicas colombianas, tanto en su versión completa como en resúmenes generados automáticamente, del cual se extraen características como polaridad, subjetividad, distribución temática y frecuencias de términos. Se entrenaron más de 700 modelos de perceptrón multicapa (MLP) aplicando búsqueda en malla sobre distintas configuraciones de profundidad, regularización, reducción dimensional con PCA y filtrado semántico. Los resultados muestran que los modelos basados en resúmenes superan a los que usan texto completo en todas las métricas (precisión, F1, recall y exactitud), alcanzando un máximo de 61.4% de precisión. Asimismo, se observa que una inclusión excesiva de términos léxicos reduce el rendimiento, mientras que el uso selectivo de variables semánticas mejora la capacidad de generalización. Esta investigación demuestra la viabilidad de aplicar técnicas de Procesamiento de Lenguaje Natural (PLN) en el análisis financiero de mercados emergentes, y resalta el valor predictivo que pueden tener los resúmenes noticiosos.	spa
dc.description.abstractenglish	This research examines the potential of economic news language to forecast daily movements of Colombia’s COLCAP stock index. Drawing on foundations in financial sentiment analysis and NLP, the study introduces a deep learning pipeline that combines semantic features from news texts with conventional market metrics. A refined dataset of Colombian financial news—both full articles and automatically generated summaries—is used to extract sentiment polarity, subjectivity, topic themes, and term frequency patterns. Over 700 Multilayer Perceptron (MLP) models were trained using grid search across combinations of network depth, dropout rates, PCA dimensionality reduction, and semantic filtering thresholds. The evaluation reveals that summary-based models consistently outperform full-text versions in all major metrics (accuracy, precision, recall, F1), achieving up to 61.4% accuracy. Notably, the inclusion of too many lexical features tends to degrade performance, while more abstract and targeted semantic features improve model generalization. This work highlights the applicability of NLP techniques in the financial analysis of emerging markets and underscores the value of news summaries as efficient predictors of stock index trends.	spa
dc.format.extent	93 páginas	spa
dc.format.mimetype	application/pdf	spa
dc.identifier.uri	https://hdl.handle.net/20.500.12010/36936
dc.language.iso	spa	spa
dc.relation.references	Acemoglu, D., & Robinson, J.A. (2012). Why nations fail: The origins of power, prosperity, and poverty. Crown Business.
dc.relation.references	Aggarwal, C.C., & Zhai, C. (2012). Mining text data. Springer Science & Business Media.
dc.relation.references	Antweiler, W., & Frank, M. Z. (2004). Is all that talk just noise? The information content of internet stock message boards. The Journal of Finance, 59(3), 1259-1294
dc.relation.references	Awajan, A., Alsaade, F., & Jararweh, Y. (2020). Stock market prediction using news sentiment analysis and deep learning. Information Processing & Management, 57(6), 102348
dc.relation.references	Bahdanau, D., Cho, K., & Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473.
dc.relation.references	Blanchard, O. (2017). Macroeconomics (7th ed.). Pearson Education
dc.relation.references	Bollen, J., Mao, H., & Zeng, X. (2011). Twitter mood predicts the stock market. Journal of Computational Science, 2(1), 1-8.
dc.relation.references	Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., ... & Amodei, D. (2020). Language models are few-shot learners. Advances in neural information processing systems, 33, 1877-1901
dc.relation.references	Cambria, E. (2016). Affective computing and sentiment analysis. IEEE Intelligent Systems, 31(2), 102-107
dc.relation.references	.Chen, H., De, P., Hu, Y. J., & Hwang, B. H. (2014). Wisdom of crowds: The value of stock opinions transmitted through social media. Decision Support Systems, 57, 103-111
dc.relation.references	Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., & Bengio, Y. (2014). Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078.
dc.relation.references	Das, S. R., & Chen, M. Y. (2007). Yahoo! for Amazon: Sentiment extraction from small talk on the web. Management Science, 53(9), 1375-1388.
dc.relation.references	Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). BERT: Pre training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
dc.relation.references	Elman, J.L. (1990). Finding structure in time. Cognitive science, 14(2), 179-211.
dc.relation.references	Fama, E.F. (1970). Efficient capital markets: A review of theory and empirical work. The Journal of Finance, 25(2), 383-417.
dc.relation.references	Feldman, R. (2013). Techniques and applications for sentiment analysis. Communications of the ACM, 56(4), 76-84.
dc.relation.references	Gatt, A., & Krahmer, E. (2018). Survey of the state of the art in natural language generation: Core tasks, applications and evaluation. Journal of Artificial Intelligence Research, 61, 65-170.
dc.relation.references	Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT press
dc.relation.references	Hagenau, M., Wohlfahrt, R., & Knappe, R. (2013). Automated news reading: Stock price prediction based on financial news using context capturing recurrent neural networks. Decision Support Systems, 55(3), 685-693
dc.relation.references	Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9(8), 1735-1780
dc.relation.references	.Hutto, C.J., & Gilbert, E.E. (2014). VADER: A parsimonious rule-based model for sentiment analysis of social media text. Proceedings of the AAAI Conference on Web and Social Media, 8(1), 216-225.
dc.relation.references	Jurafsky, D., & Martin, J.H. (2023). Speech and language processing (3rd ed. draft)
dc.relation.references	Liu, B. (2012). Sentiment analysis and opinion mining. Synthesis lectures on human language technologies, 5(1), 1-167.
dc.relation.references	Loughran, T., & McDonald, B. (2011). When is a liability not a liability? Textual analysis, dictionaries, and 10-Ks. The Journal of Finance, 66(1), 35-65
dc.relation.references	Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781
dc.relation.references	Nassirtoussi, A.K., Aghabozorgi, S., Wah, T.Y., & Ngo, D.C.L. (2014). Text mining for market prediction: A systematic review. Expert Systems with Applications, 41(15), 7653-7670.
dc.relation.references	Pang, B., & Lee, L. (2008). Opinion mining and sentiment analysis. Foundations and Trends in Information Retrieval, 2(1-2), 1-135.
dc.relation.references	Pennington, J., Socher, R., & Manning, C. D. (2014). Glove: Global vectors for word representation. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 1532-1543
dc.relation.references	Schumaker, R. P., & Chen, H. (2009). Textual analysis of stock market prediction using breaking financial news: The AZFinText system. Information Systems Frontiers, 11(1), 115-126.
dc.relation.references	Sebastiani, F. (2002). Machine learning in automated text categorization. ACM computing surveys (CSUR), 34(1), 1-47.
dc.relation.references	Tetlock, P.C. (2007). Giving content to investor sentiment: The role of media in the stock market. The Journal of Finance, 62(3), 1139-1168.
dc.relation.references	Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., & Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing
dc.relation.references	Vargas, D., Coronado, C., Melo, J., & Gelvez, J. (2023). Análisis de la influencia de noticias en el mercado accionario colombiano mediante técnicas de procesamiento del lenguaje natural y aprendizaje automático. Research in Computing Science, 152(3), 12-25
dc.subject	Análisis de Sentimientos
dc.subject	COLCAP
dc.subject	Aprendizaje Profundo
dc.subject	Resumen Automático de Noticias
dc.subject	Características Semánticas
dc.subject	Predicción del Mercado Bursátil
dc.subject	PLN Financiero	spa
dc.subject.keyword	Sentiment Analysis
dc.subject.keyword	COLCAP
dc.subject.keyword	Deep Learning
dc.subject.keyword	News Summarization
dc.subject.keyword	Semantic Features
dc.subject.keyword	Stock Market Prediction.
dc.subject.keyword	Financial NLP	spa
dc.subject.lemb	Lenguaje financiero – Análisis semántico
dc.subject.lemb	Procesamiento del lenguaje natural – Aplicaciones en finanzas
dc.subject.lemb	Bolsa de valores – Modelos predictivos
dc.title	Aplicación de procesamiento del lenguaje natural (PLN) para la predicción del impacto de noticias colombianas en el índice COLCAP (Bolsa de Colombia)	spa
dc.type.coar	http://purl.org/coar/resource_type/c_bdcc	spa

Archivos

Bloque original

Mostrando 1 - 1 de 1

Nombre:: Aplicacion de PLN sobre noticias colombianas y prediccion de su impacto en la bolsa de valores colombiana (1).pdf
Tamaño:: 3.64 MB
Formato:: Adobe Portable Document Format
Descripción:: Documento reservado

Descargar

Bloque de licencias

Mostrando 1 - 2 de 2

Nombre:: license.txt
Tamaño:: 2.87 KB
Formato:: Item-specific license agreed upon to submission
Descripción:

Descargar

Nombre:: FOR-EFE-GDB-008_AUTORIZACION_DE_PUBLICACION_DE_TESIS_O_TRABAJO_DE_GRADO_DE_FORMA_CONFIDENCIAL.docx 1.pdf
Tamaño:: 407.33 KB
Formato:: Adobe Portable Document Format
Descripción:: Carta de autorización

Descargar

Colecciones

Maestría en Ingeniería y Analítica de Datos