Improved K-means clustering algorithms : a thesis presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Computer Science, Massey University, New Zealand

Liu, Tong

Improved K-means clustering algorithms : a thesis presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Computer Science, Massey University, New Zealand

dc.confidential	Embargo : No	en_US
dc.contributor.advisor	Zhu, Xiaofeng
dc.contributor.author	Liu, Tong
dc.date.accessioned	2020-04-02T22:54:10Z
dc.date.accessioned	2020-06-04T02:48:20Z
dc.date.available	2020-04-02T22:54:10Z
dc.date.available	2020-06-04T02:48:20Z
dc.date.issued	2020
dc.description.abstract	𝘒-means clustering algorithm is designed to divide the samples into subsets with the goal that maximizes the intra-subset similarity and inter-subset dissimilarity where the similarity measures the relationship between two samples. As an unsupervised learning technique, 𝘒-means clustering algorithm is considered one of the most used clustering algorithms and has been applied in a variety of areas such as artificial intelligence, data mining, biology, psychology, marketing, medicine, etc. 𝘒-means clustering algorithm is not robust and its clustering result depends on the initialization, the similarity measure, and the predefined cluster number. Previous research focused on solving a part of these issues but has not focused on solving them in a unified framework. However, fixing one of these issues does not guarantee the best performance. To improve 𝘒-means clustering algorithm, one of the most famous and widely used clustering algorithms, by solving its issues simultaneously is challenging and significant. This thesis conducts an extensive research on 𝘒-means clustering algorithm aiming to improve it. First, we propose the Initialization-Similarity (IS) clustering algorithm to solve the issues of the initialization and the similarity measure of 𝘒-means clustering algorithm in a unified way. Specifically, we propose to fix the initialization of the clustering by using sum-of-norms (SON) which outputs the new representation of the original samples and to learn the similarity matrix based on the data distribution. Furthermore, the derived new representation is used to conduct 𝘒-means clustering. Second, we propose a Joint Feature Selection with Dynamic Spectral (FSDS) clustering algorithm to solve the issues of the cluster number determination, the similarity measure, and the robustness of the clustering by selecting effective features and reducing the influence of outliers simultaneously. Specifically, we propose to learn the similarity matrix based on the data distribution as well as adding the ranked constraint on the Laplacian matrix of the learned similarity matrix to automatically output the cluster number. Furthermore, the proposed algorithm employs the L2,1-norm as the sparse constraints on the regularization term and the loss function to remove the redundant features and reduce the influence of outliers respectively. Third, we propose a Joint Robust Multi-view (JRM) spectral clustering algorithm that conducts clustering for multi-view data while solving the initialization issue, the cluster number determination, the similarity measure learning, the removal of the redundant features, and the reduction of outlier influence in a unified way. Finally, the proposed algorithms outperformed the state-of-the-art clustering algorithms on real data sets. Moreover, we theoretically prove the convergences of the proposed optimization methods for the proposed objective functions.	en_US
dc.identifier.uri	http://hdl.handle.net/10179/15384
dc.identifier.wikidata	Q111965567
dc.identifier.wikidata-uri	https://www.wikidata.org/wiki/Q111965567
dc.publisher	Massey University	en_US
dc.rights	The Author	en_US
dc.subject	Cluster analysis	en
dc.subject	Data processing	en
dc.subject	Computer algorithms	en
dc.subject	Machine learning	en
dc.title	Improved K-means clustering algorithms : a thesis presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Computer Science, Massey University, New Zealand	en_US
dc.type	Thesis	en_US
massey.contributor.author	Liu, Tong	en_US
thesis.degree.discipline	Computer Science	en_US
thesis.degree.grantor	Massey University	en_US
thesis.degree.level	Doctoral	en_US
thesis.degree.name	Doctor of Philosophy	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: LiuPhDThesis.pdf
Size:: 1.89 MB
Format:: Adobe Portable Document Format

Download

Collections

Theses and Dissertations