Azerbaijan Text Clustering using Machine Learning Methods

Simple item page

dc.contributor.author	Bashirov, Sokrat
dc.date.accessioned	2025-11-04T06:52:07Z
dc.date.available	2025-11-04T06:52:07Z
dc.date.issued	2024
dc.description.abstract	In this digital era, the explosion of textual data is causing us to develop sophisticated text mining and clustering methods. Although the state of art has improved for most well-resourced languages, relatively little research had been carried out on a language with smaller resource like Azerbaijani. In this thesis I investigated using clustering algorithms to enhance the information and communication access in Azerbaijani speaking community. 15,500 news articles were used compiled as a part of oxu.az. So, K-means, Fuzzy-Kmeans, Agglomerative Hierarchical Clustering, Spectral Clustering along with Gaussian Mixture Model (GMM) and Latent Dirichlet Allocation were deployed. They were evaluated on the basis of Silhouette Score (SS) and Davies-Bouldin Index. Word2Vec embeddings yield higher ARI than TF-IDF, while Spectral Clustering and LDA report superior scores owing to their capability of mapping complex workout nodes. The future works will improve the Pre-processing, hybrid Clustering and Deep Learning Embeddings. Applications to real-world problems ranging from recommendation systems and content categorization, all of which will build experience with the models.	en_US
dc.identifier.uri	http://hdl.handle.net/20.500.12181/1525
dc.language.iso	en	en_US
dc.publisher	ADA University	en_US
dc.rights	Attribution-NonCommercial-NoDerivs 3.0 United States	*
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/3.0/us/	*
dc.subject	Text mining -- Azerbaijan.	en_US
dc.subject	Natural language processing (Computer science) -- Azerbaijan.	en_US
dc.subject	Machine learning -- Azerbaijan.	en_US
dc.subject	Data mining -- Azerbaijan.	en_US
dc.title	Azerbaijan Text Clustering using Machine Learning Methods	en_US
dc.type	Thesis	en_US
dcterms.accessRights	Absolute Embargo Only Bibliogrsphic Record and Abstract

Collections

School of Information Technologies and Engineering