Aplicación de procesamiento del lenguaje natural (PLN) para la predicción del impacto de noticias colombianas en el índice COLCAP (Bolsa de Colombia)
| dc.contributor.advisor | Romero Gelvez, Jorge Ivan | |
| dc.creator | Rico Gaviria, David | |
| dc.date.accessioned | 2025-06-24T14:00:00Z | |
| dc.date.available | 2025-06-24T14:00:00Z | |
| dc.date.created | 2025-06-10 | |
| dc.description.abstract | Este estudio analiza cómo el lenguaje utilizado en noticias económicas puede anticipar las variaciones diarias del índice COLCAP, principal referente bursátil de Colombia. A partir de investigaciones previas en lingüística financiera y análisis de sentimientos, se propone un enfoque basado en aprendizaje profundo que integra representaciones semánticas de noticias con indicadores financieros tradicionales. Se emplea un corpus curado de noticias económicas colombianas, tanto en su versión completa como en resúmenes generados automáticamente, del cual se extraen características como polaridad, subjetividad, distribución temática y frecuencias de términos. Se entrenaron más de 700 modelos de perceptrón multicapa (MLP) aplicando búsqueda en malla sobre distintas configuraciones de profundidad, regularización, reducción dimensional con PCA y filtrado semántico. Los resultados muestran que los modelos basados en resúmenes superan a los que usan texto completo en todas las métricas (precisión, F1, recall y exactitud), alcanzando un máximo de 61.4% de precisión. Asimismo, se observa que una inclusión excesiva de términos léxicos reduce el rendimiento, mientras que el uso selectivo de variables semánticas mejora la capacidad de generalización. Esta investigación demuestra la viabilidad de aplicar técnicas de Procesamiento de Lenguaje Natural (PLN) en el análisis financiero de mercados emergentes, y resalta el valor predictivo que pueden tener los resúmenes noticiosos. | spa |
| dc.description.abstractenglish | This research examines the potential of economic news language to forecast daily movements of Colombia’s COLCAP stock index. Drawing on foundations in financial sentiment analysis and NLP, the study introduces a deep learning pipeline that combines semantic features from news texts with conventional market metrics. A refined dataset of Colombian financial news—both full articles and automatically generated summaries—is used to extract sentiment polarity, subjectivity, topic themes, and term frequency patterns. Over 700 Multilayer Perceptron (MLP) models were trained using grid search across combinations of network depth, dropout rates, PCA dimensionality reduction, and semantic filtering thresholds. The evaluation reveals that summary-based models consistently outperform full-text versions in all major metrics (accuracy, precision, recall, F1), achieving up to 61.4% accuracy. Notably, the inclusion of too many lexical features tends to degrade performance, while more abstract and targeted semantic features improve model generalization. This work highlights the applicability of NLP techniques in the financial analysis of emerging markets and underscores the value of news summaries as efficient predictors of stock index trends. | spa |
| dc.format.extent | 93 páginas | spa |
| dc.format.mimetype | application/pdf | spa |
| dc.identifier.uri | https://hdl.handle.net/20.500.12010/36936 | |
| dc.language.iso | spa | spa |
| dc.relation.references | Acemoglu, D., & Robinson, J.A. (2012). Why nations fail: The origins of power, prosperity, and poverty. Crown Business. | |
| dc.relation.references | Aggarwal, C.C., & Zhai, C. (2012). Mining text data. Springer Science & Business Media. | |
| dc.relation.references | Antweiler, W., & Frank, M. Z. (2004). Is all that talk just noise? The information content of internet stock message boards. The Journal of Finance, 59(3), 1259-1294 | |
| dc.relation.references | Awajan, A., Alsaade, F., & Jararweh, Y. (2020). Stock market prediction using news sentiment analysis and deep learning. Information Processing & Management, 57(6), 102348 | |
| dc.relation.references | Bahdanau, D., Cho, K., & Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473. | |
| dc.relation.references | Blanchard, O. (2017). Macroeconomics (7th ed.). Pearson Education | |
| dc.relation.references | Bollen, J., Mao, H., & Zeng, X. (2011). Twitter mood predicts the stock market. Journal of Computational Science, 2(1), 1-8. | |
| dc.relation.references | Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., ... & Amodei, D. (2020). Language models are few-shot learners. Advances in neural information processing systems, 33, 1877-1901 | |
| dc.relation.references | Cambria, E. (2016). Affective computing and sentiment analysis. IEEE Intelligent Systems, 31(2), 102-107 | |
| dc.relation.references | .Chen, H., De, P., Hu, Y. J., & Hwang, B. H. (2014). Wisdom of crowds: The value of stock opinions transmitted through social media. Decision Support Systems, 57, 103-111 | |
| dc.relation.references | Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., & Bengio, Y. (2014). Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078. | |
| dc.relation.references | Das, S. R., & Chen, M. Y. (2007). Yahoo! for Amazon: Sentiment extraction from small talk on the web. Management Science, 53(9), 1375-1388. | |
| dc.relation.references | Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). BERT: Pre training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805. | |
| dc.relation.references | Elman, J.L. (1990). Finding structure in time. Cognitive science, 14(2), 179-211. | |
| dc.relation.references | Fama, E.F. (1970). Efficient capital markets: A review of theory and empirical work. The Journal of Finance, 25(2), 383-417. | |
| dc.relation.references | Feldman, R. (2013). Techniques and applications for sentiment analysis. Communications of the ACM, 56(4), 76-84. | |
| dc.relation.references | Gatt, A., & Krahmer, E. (2018). Survey of the state of the art in natural language generation: Core tasks, applications and evaluation. Journal of Artificial Intelligence Research, 61, 65-170. | |
| dc.relation.references | Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT press | |
| dc.relation.references | Hagenau, M., Wohlfahrt, R., & Knappe, R. (2013). Automated news reading: Stock price prediction based on financial news using context capturing recurrent neural networks. Decision Support Systems, 55(3), 685-693 | |
| dc.relation.references | Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9(8), 1735-1780 | |
| dc.relation.references | .Hutto, C.J., & Gilbert, E.E. (2014). VADER: A parsimonious rule-based model for sentiment analysis of social media text. Proceedings of the AAAI Conference on Web and Social Media, 8(1), 216-225. | |
| dc.relation.references | Jurafsky, D., & Martin, J.H. (2023). Speech and language processing (3rd ed. draft) | |
| dc.relation.references | Liu, B. (2012). Sentiment analysis and opinion mining. Synthesis lectures on human language technologies, 5(1), 1-167. | |
| dc.relation.references | Loughran, T., & McDonald, B. (2011). When is a liability not a liability? Textual analysis, dictionaries, and 10-Ks. The Journal of Finance, 66(1), 35-65 | |
| dc.relation.references | Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 | |
| dc.relation.references | Nassirtoussi, A.K., Aghabozorgi, S., Wah, T.Y., & Ngo, D.C.L. (2014). Text mining for market prediction: A systematic review. Expert Systems with Applications, 41(15), 7653-7670. | |
| dc.relation.references | Pang, B., & Lee, L. (2008). Opinion mining and sentiment analysis. Foundations and Trends in Information Retrieval, 2(1-2), 1-135. | |
| dc.relation.references | Pennington, J., Socher, R., & Manning, C. D. (2014). Glove: Global vectors for word representation. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 1532-1543 | |
| dc.relation.references | Schumaker, R. P., & Chen, H. (2009). Textual analysis of stock market prediction using breaking financial news: The AZFinText system. Information Systems Frontiers, 11(1), 115-126. | |
| dc.relation.references | Sebastiani, F. (2002). Machine learning in automated text categorization. ACM computing surveys (CSUR), 34(1), 1-47. | |
| dc.relation.references | Tetlock, P.C. (2007). Giving content to investor sentiment: The role of media in the stock market. The Journal of Finance, 62(3), 1139-1168. | |
| dc.relation.references | Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., & Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing | |
| dc.relation.references | Vargas, D., Coronado, C., Melo, J., & Gelvez, J. (2023). Análisis de la influencia de noticias en el mercado accionario colombiano mediante técnicas de procesamiento del lenguaje natural y aprendizaje automático. Research in Computing Science, 152(3), 12-25 | |
| dc.subject | Análisis de Sentimientos | |
| dc.subject | COLCAP | |
| dc.subject | Aprendizaje Profundo | |
| dc.subject | Resumen Automático de Noticias | |
| dc.subject | Características Semánticas | |
| dc.subject | Predicción del Mercado Bursátil | |
| dc.subject | PLN Financiero | spa |
| dc.subject.keyword | Sentiment Analysis | |
| dc.subject.keyword | COLCAP | |
| dc.subject.keyword | Deep Learning | |
| dc.subject.keyword | News Summarization | |
| dc.subject.keyword | Semantic Features | |
| dc.subject.keyword | Stock Market Prediction. | |
| dc.subject.keyword | Financial NLP | spa |
| dc.subject.lemb | Lenguaje financiero – Análisis semántico | |
| dc.subject.lemb | Procesamiento del lenguaje natural – Aplicaciones en finanzas | |
| dc.subject.lemb | Bolsa de valores – Modelos predictivos | |
| dc.title | Aplicación de procesamiento del lenguaje natural (PLN) para la predicción del impacto de noticias colombianas en el índice COLCAP (Bolsa de Colombia) | spa |
| dc.type.coar | http://purl.org/coar/resource_type/c_bdcc | spa |
Archivos
Bloque original
1 - 1 de 1
Cargando...
- Nombre:
- Aplicacion de PLN sobre noticias colombianas y prediccion de su impacto en la bolsa de valores colombiana (1).pdf
- Tamaño:
- 3.64 MB
- Formato:
- Adobe Portable Document Format
- Descripción:
- Documento reservado
Bloque de licencias
1 - 2 de 2
Cargando...
- Nombre:
- license.txt
- Tamaño:
- 2.87 KB
- Formato:
- Item-specific license agreed upon to submission
- Descripción:
Cargando...
- Nombre:
- FOR-EFE-GDB-008_AUTORIZACION_DE_PUBLICACION_DE_TESIS_O_TRABAJO_DE_GRADO_DE_FORMA_CONFIDENCIAL.docx 1.pdf
- Tamaño:
- 407.33 KB
- Formato:
- Adobe Portable Document Format
- Descripción:
- Carta de autorización
