Research on adjacent matrix for K-means clustering : a thesis presented for the degree of Master of Computer Science in School of Natural and Computational Sciences at Massey University, Auckland, New Zealand

Zhou, Jukai

Research on adjacent matrix for K-means clustering : a thesis presented for the degree of Master of Computer Science in School of Natural and Computational Sciences at Massey University, Auckland, New Zealand

dc.contributor.author	Zhou, Jukai
dc.date.accessioned	2020-06-25T01:58:16Z
dc.date.available	2020-06-25T01:58:16Z
dc.date.issued	2019
dc.description.abstract	Machine learning is playing a vital role in our modern world. Depending on whether the data has labels or not, machine learning mainly contains three categories, i.e., unsupervised learning, supervised learning, and semi-supervised learning. As labels are usually difficult and expensive to be obtained, unsupervised learning is more popular, compared to supervised learning and semi-supervised learning. Moreover, k-means clustering is very popular in the domain of unsupervised learning. Hence, this thesis focuses on the improvement of previous k-means clustering. K-means clustering has been widely applied in real applications due to its linear time complexity and ease of implementation. However, kmeans clustering is limited to its applicability due to the issues, such as identification of the cluster number k, initialisation of centroids, as well as the definition of similarity measurements for evaluating the similarity between two data points. Hence, k-means clustering is still a hot research topic in unsupervised learning. In this thesis, we propose to improve traditional k-means clustering by designing two different similarity matrices to represent the original data points. The first method first constructs a new representation (i.e., an adjacent matrix) to replace the original representation of data points, and then runs k-means clustering on the resulted adjacent matrix. In this way, our proposed method benefits from the high-order similarity among data points to capture the complex structure inherent in data points as well as avoids the time-consuming process of eigenvectors decomposition in spectral clustering. The second method takes into account the weights of the features to improve the former method, based on the assumption that different features contain different contributions to the construction of the clustering models. As a result, it makes the clustering model more robust, compared to the first method as well as previous clustering methods. Finally, we tested our proposed clustering methods on public UCI datasets. Experimental results showed the clustering results of our proposed methods significantly outperformed the comparison methods in terms of three evaluation metrics.	en_US
dc.identifier.uri	http://hdl.handle.net/10179/15421
dc.language.iso	en	en_US
dc.publisher	Massey University	en_US
dc.rights	The Author	en_US
dc.subject	Cluster analysis	en_US
dc.subject	Data processing	en_US
dc.subject	Computer algorithms	en_US
dc.subject	Machine learning	en_US
dc.subject	k-means clustering	en_US
dc.subject	similarity measurement	en_US
dc.subject	adjacent matrix	en_US
dc.subject	unsupervised learning	en_US
dc.title	Research on adjacent matrix for K-means clustering : a thesis presented for the degree of Master of Computer Science in School of Natural and Computational Sciences at Massey University, Auckland, New Zealand	en_US
dc.type	Thesis	en_US
massey.contributor.author	Zhou, Jukai
thesis.degree.discipline	Computer Science	en_US
thesis.degree.level	Masters	en_US
thesis.degree.name	Master of Science (MSc)	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: ZhouMScThesis.pdf
Size:: 3.89 MB
Format:: Adobe Portable Document Format
Description:

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 3.32 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Theses and Dissertations