Show simple item record

dc.coverage.spatialColombiaspa
dc.creatorAlbañil Sánchez, Misael Andrey
dc.creatorGalpin, Ixent
dc.date.accessioned2022-07-22T19:15:19Z
dc.date.available2022-07-22T19:15:19Z
dc.date.created2022
dc.identifier.urihttp://hdl.handle.net/20.500.12010/27756
dc.description.abstractThroughout the world, the provision of online goods and services has increased significantly over the last few years. We consider the case of Tango Discos, a small company in Colombia that sells entertainment products through an e-commerce website and receives customer messages through various channels, including a webform, email, Facebook and Twitter. This dataset comprises 29,970 messages collected from 2019 to 2021. Each message can be categorized as being either being a sale, request or complaint. In this work we evaluate different supervised classification models to automate the task of classifying the messages, viz. decision trees, Naive Bayes, linear Support Vector Machines and logistic regression. As the data set is unbalanced, the different models are evaluated in combination with various data balancing approaches to obtain the best performance. In order to maximize revenue, the management is interested in prioritizing messages that may result in potential sales. As such, the best model for deployment is one that minimizes false positives in the sales category, so that these are processed in a timely fashion. As such, the best performing model is found to be the Linear Support Vector Machine using the Random Over Sampler balancing technique. This model is deployed in the cloud and exposed using a RESTful interface.spa
dc.format.extent15 páginasspa
dc.format.mimetypeapplication/pdfspa
dc.language.isoengspa
dc.publisherUniversidad de Bogotá Jorge Tadeo Lozanospa
dc.sourceinstname:Universidad de Bogotá Jorge Tadeo Lozanospa
dc.sourcereponame:Expeditio Repositorio Institucional UJTLspa
dc.subjectE-Commercespa
dc.titleClassifying incoming customer messages for an e-commerce site using supervised learningspa
dc.type.localTrabajo de grado de maestríaspa
dc.subject.lembComercio electrónico -- Tesis y disertaciones académicasspa
dc.subject.lembComercio electrónico -- Medidas de seguridad -- Tesis y disertaciones académicasspa
dc.subject.lembMinería de datos -- Tesis y disertaciones académicasspa
dc.rights.accessrightsinfo:eu-repo/semantics/openAccessspa
dc.type.hasversioninfo:eu-repo/semantics/acceptedVersionspa
dc.rights.localAbierto (Texto Completo)spa
dc.identifier.repourlhttp://expeditio.utadeo.edu.cospa
dc.creator.degreeMagíster en Ingeniería y Analítica de Datosspa
dc.publisher.programMaestría en Ingeniería y Analítica de Datosspa
dc.relation.referencesAdaji, I., Kiron, N., Vassileva, J.: Evaluating the susceptibility of e-commerce shoppers to persuasive strategies. a game-based approach. In: International Conference on Persuasive Technology. pp. 58–72. Springer (2020)spa
dc.relation.referencesAlghoul, A., Al Ajrami, S., Al Jarousha, G., Harb, G., Abu-Naser, S.S.: Email classification using artificial neural network (2018)spa
dc.relation.referencesBlackSip, Vtex, Nielsen, PayU, Credibanco, MercadoLibre, Rappi, emBlue, Icommkt: BlackIndex: reporte del ecommerce en Colombia. BlackSip (2019)spa
dc.relation.referencesBusemann, S., Schmeier, S., Arens, R.G.: Message classification in the call center. arXiv preprint cs/0003060 (2000)spa
dc.relation.referencesConfecamaras: https://confecamaras.org.co (13 de Enero de 2022)spa
dc.relation.referencesDuan, L., Li, A., Huang, L.: A new spam short message classification. In: 2009 First International Workshop on Education Technology and Computer Science. vol. 2, pp. 168–171. IEEE (2009)spa
dc.relation.referencesFang, W., Luo, H., Xu, S., Love, P.E., Lu, Z., Ye, C.: Automated text classification of near-misses from safety reports: An improved deep learning approach. Advanced Engineering Informatics 44, 101060 (2020)spa
dc.relation.referencesManning, C., Raghavan, P., Sch¨utze, H.: Introduction to information retrieval. Natural Language Engineering 16(1), 100–103 (2010)spa
dc.relation.referencesMansoor, R., Jayasinghe, N.D., Muslam, M.M.A.: A comprehensive review on email spam classification using machine learning algorithms. In: 2021 International Conference on Information Networking (ICOIN). pp. 327–332. IEEE (2021)spa
dc.relation.referencesMasterov, D.V., Mayer, U.F., Tadelis, S.: Canary in the e-commerce coal mine: Detecting and predicting poor experiences using buyer-to-seller messages. In: Proceedings of the Sixteenth ACM Conference on Economics and Computation. pp. 81–93 (2015)spa
dc.relation.referencesMenini, S., Moretti, G., Corazza, M., Cabrio, E., Tonelli, S., Villata, S.: A system to monitor cyberbullying based on message classification and social network analysis. In: Proceedings of the third workshop on abusive language online. pp. 105–110 (2019)spa
dc.relation.referencesMohammed,R., Rawashdeh, J., Abdullah, M.: Machine learning with oversampling and undersampling techniques: overview study and experimental results. In: 2020 11th international conference on information and communication systems (ICICS). pp. 243–248. IEEE (2020)spa
dc.relation.referencesNkansah, E.A.: Kayayo: An e-commerce site with recommendations and text messaging (2013)spa
dc.relation.referencesOzel, S.A., Sara¸c, E., Akdemir, S., Aksu, H.: Detection of cyberbullying on social media messages in turkish. In: 2017 International Conference on Computer Science and Engineering (UBMK). pp. 366–370. IEEE (2017)spa
dc.relation.referencesWebster, J.J., Kit, C.: Tokenization as the initial phase in nlp. In: COLING 1992 volume 4: The 14th international conference on computational linguistics (1992)spa
dc.relation.referencesWirth, R., Hipp, J.: Crisp-dm: Towards a standard process model for data mining. In: Proceedings of the 4th international conference on the practical applications of knowledge discovery and data mining. vol. 1, pp. 29–39. Manchester (2000)spa
dc.relation.referencesZois, D.S., Kapodistria, A., Yao, M., Chelmis, C.: Optimal online cyberbullying detection. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). pp. 2017–2021. IEEE (2018)spa
dc.description.abstractenglishEn todo el mundo, la adquisicion de bienes y servicios en línea ha aumentado significativamente en los últimos años. Consideramos el caso de Tango Discos, que es una pequeña empresa en Colombia que vende productos de entretenimiento a través de un sitio web de comercio electrónico y recibe mensajes de los clientes a través de varios canales, incluido un formulario web, correo electrónico, Facebook y Twitter. Este conjunto de datos comprende 29,970 mensajes recopilados entre 2019 y 2021. Cada mensaje se puede clasificar como una venta, una solicitud o una queja. En este trabajo evaluamos diferentes modelos de clasificación supervisada para automatizar la tarea de clasificar los mensajes, a saber. árboles de decisión, Naive Bayes, Máquinas de Vectores Soporte lineales y regresión logística. Como el conjunto de datos está desequilibrado, los diferentes modelos se evalúan en combinación con varias tecnicas de balanceo de datos para obtener el mejor rendimiento. Como requerimiento desde el negocio, la gerencia está interesada en priorizar los mensajes que pueden resultar en ventas potenciales. Como tal, el mejor modelo para la implementación es aquel que minimiza los falsos positivos en la categoría de ventas, para que estos se procesen de manera oportuna. Asi, se encuentra que el modelo con mejor desempeño es el lineal. Support Vector Machine utilizando la técnica de balanceo Random Over Sampler. Este modelo se implementa en la nube y se expone mediante una API RESTful.spa
dc.type.driverinfo:eu-repo/semantics/masterThesisspa
dc.type.coarhttp://purl.org/coar/resource_type/c_2df8fbb1spa


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record