Azerbaijan Text Clustering using Machine Learning Methods

dc.contributor.authorBashirov, Sokrat
dc.date.accessioned2025-11-04T06:52:07Z
dc.date.available2025-11-04T06:52:07Z
dc.date.issued2024
dc.description.abstractIn this digital era, the explosion of textual data is causing us to develop sophisticated text mining and clustering methods. Although the state of art has improved for most well-resourced languages, relatively little research had been carried out on a language with smaller resource like Azerbaijani. In this thesis I investigated using clustering algorithms to enhance the information and communication access in Azerbaijani speaking community. 15,500 news articles were used compiled as a part of oxu.az. So, K-means, Fuzzy-Kmeans, Agglomerative Hierarchical Clustering, Spectral Clustering along with Gaussian Mixture Model (GMM) and Latent Dirichlet Allocation were deployed. They were evaluated on the basis of Silhouette Score (SS) and Davies-Bouldin Index. Word2Vec embeddings yield higher ARI than TF-IDF, while Spectral Clustering and LDA report superior scores owing to their capability of mapping complex workout nodes. The future works will improve the Pre-processing, hybrid Clustering and Deep Learning Embeddings. Applications to real-world problems ranging from recommendation systems and content categorization, all of which will build experience with the models.en_US
dc.identifier.urihttp://hdl.handle.net/20.500.12181/1525
dc.language.isoenen_US
dc.publisherADA Universityen_US
dc.rightsAttribution-NonCommercial-NoDerivs 3.0 United States*
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/3.0/us/*
dc.subjectText mining -- Azerbaijan.en_US
dc.subjectNatural language processing (Computer science) -- Azerbaijan.en_US
dc.subjectMachine learning -- Azerbaijan.en_US
dc.subjectData mining -- Azerbaijan.en_US
dc.titleAzerbaijan Text Clustering using Machine Learning Methodsen_US
dc.typeThesisen_US
dcterms.accessRightsAbsolute Embargo Only Bibliogrsphic Record and Abstract

Files