Azerbaijan Text Clustering using Machine Learning Methods

Loading...
Thumbnail Image

Date

Journal Title

Journal ISSN

Volume Title

Publisher

ADA University

Abstract

In this digital era, the explosion of textual data is causing us to develop sophisticated text mining and clustering methods. Although the state of art has improved for most well-resourced languages, relatively little research had been carried out on a language with smaller resource like Azerbaijani. In this thesis I investigated using clustering algorithms to enhance the information and communication access in Azerbaijani speaking community. 15,500 news articles were used compiled as a part of oxu.az. So, K-means, Fuzzy-Kmeans, Agglomerative Hierarchical Clustering, Spectral Clustering along with Gaussian Mixture Model (GMM) and Latent Dirichlet Allocation were deployed. They were evaluated on the basis of Silhouette Score (SS) and Davies-Bouldin Index. Word2Vec embeddings yield higher ARI than TF-IDF, while Spectral Clustering and LDA report superior scores owing to their capability of mapping complex workout nodes. The future works will improve the Pre-processing, hybrid Clustering and Deep Learning Embeddings. Applications to real-world problems ranging from recommendation systems and content categorization, all of which will build experience with the models.

Description

Keywords

Text mining -- Azerbaijan., Natural language processing (Computer science) -- Azerbaijan., Machine learning -- Azerbaijan., Data mining -- Azerbaijan.

Citation

Endorsement

Review

Supplemented By

Referenced By

Creative Commons license

Except where otherwised noted, this item's license is described as Attribution-NonCommercial-NoDerivs 3.0 United States