A multi-view K-multiple-means clustering method

2021-12-21 14:09ZHANGNiniGEHongwei

ZHANG Nini, GE Hongwei

(1. School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi 214122, China;2. Jiangsu Provincial Engineering Laboratory of Pattern Recognition and Computational Intelligence,Jiangnan University, Wuxi 214122, China)

Abstract: The K-multiple-means (KMM) retains the simple and efficient advantages of the K-means algorithm by setting multiple subclasses, and improves its effect on non-convex data sets. And aiming at the problem that it cannot be applied to the Internet on a multi-view data set, a multi-view K-multiple-means (MKMM) clustering method is proposed in this paper. The new algorithm introduces view weight parameter, reserves the design of setting multiple subclasses, makes the number of clusters as constraint and obtains clusters by solving optimization problem. The new algorithm is compared with some popular multi-view clustering algorithms. The effectiveness of the new algorithm is proved through the analysis of the experimental results.

Key words: K-multiple-means (KMM) clustering; weight parameters; multi-view K-multiple-means (MKMM) method

0 Introduction

Clustering is a common technology for pattern recognition and is widely applied to machine learning task such as image segmentation and user portrait. K-means[1]is the most classical method among a large number of existing clustering algorithms. And it is widely used due to its high efficiency and intuitive principle. However, the K-means algorithm does not perform well on non-spherical datasets. In order to improve the defect of K-means, many variants algorithm of K-means algorithm have been proposed[2-5]. Setting multiple subclasses for a class is a strategy to improve the K-means algorithm. This design is more in line with the practical application scenarios, and performs better on non-convex datasets. K-multiple-means (KMM), a multiple-means clustering method with specified K[6]proposed by Nie et al. at ACM SIGKDD in 2019, is a typical example. KMM algorithm is concerned because of its excellent performance[5-7].

KMM is a traditional clustering method to study samples through a set of characteristics. When the samples has multiple groups of features, multi-view clustering can divide the samples by integrating and processing multiple groups of features. In the era of big data, data with multiple sets of characteristics is very common in real scenarios[7]. For example, in the problem of understanding multimedia content, multimedia content contains both the video signal from the camera and the audio signal from the microphone. In some image recognition tasks, image features include colour feature, textures feature and shape feature. The proliferation of multi-view data makes many scholars have a strong interest in multi-view learning. The data on different views of multi-view data is heterogeneous but potentially related. In other word, in multi-view data, each individual view has a specific attribute for a specific knowledge discovery task, but the different views often contain complementary information. Therefore, how to use this information to reveal the potential value of multi-perspective data is very important in big data research. In terms of unsupervised learning, the clustering method based on single view can not solve problems by using multi-view information as effectively as the multi-view clustering in many cases. Multi-view clustering needs to judge relationship among samples in each view respectively and completes clustering task by using the complementary and consensus information of multiple views. It is difficult to obtain good result by integrating different views to a single view and then using the advanced traditional clustering algorithm to cluster. Because each view has its specific attributes, in the process of feature fusion, a particular view may have higher weight than other views, resulting in clustering relies on only one of the views.

Although KMM algorithm has excellent performance in traditional clustering, similar to other single-view clustering methods, it cannot use multiple groups of feature in multi-view data to complete the clustering task effectively. In this paper, a multi-view K-multiple-means (MKMM) clustering method is proposed. In the new algorithm, weight parameters of view are introduced, a new objective function is proposed, and the optimum allocation of weight parameters and clustering results of data are obtained through alternating optimization strategy.

For a better clustering performance on multi-view data, it’s an effective strategy that modify good traditional clustering method to adapt to multi-view datasets. Recently, KMM and MKMM has excellent performance on traditional data sets. This paper proposed a new multi-view clustering based on KMM and revisited KMM in this section.

The key idea of KMM algorithm is to set multiple prototypes for each cluster and make the location of clustering prototype and the partitioning of data to prototype an optimization problem.

s.t.S≥0,S1=1,S∈Ω.

(1)

The position of prototypes will change asSchanges. WhenSupdated, each prototype can be relocated. This process can be iteratively performed by

s.t.S≥0,S1=1,S∈Ω,A∈Rm×d.

(2)

The clustering result can be obtained by solving problem (2).

KMM performs well in traditional clustering, so the multi-view clustering based on KMM has a good foundation for multi-view data clustering.

1 Proposed method

1.1 Design of objective function

As mentioned above, KMM sets multiple prototypes in each class for better performance (Fig.1), and solving problem (1) can obtain the assignment of data to neighboring prototypes. Supposing that the data set hasnvviews, this paper integrates information from all views by introducing weight parametersW=[w1,w2,…,wnv] to extend KMM algorithm to multi-view data. Then, problem (1) becomes

Fig.1 Integration of multi-view schematic

s.t.S≥0,S1=1,

(3)

Similar to the single view, the task assigned to a similar prototype for each data sample are independent of the other data samples in the multi-view. However, when data is partitioned into similar prototypes, the information of different views will influence each other and jointly determine the partition of sample.Therefore, the assignment of each samplexiis presented as

(4)

After the matrixSis updated, each prototype will be relocated according to the average of new subclass. This process is performed iteratively until the partition of sample is no longer changed which can be expressed as

s.t.S≥0,S1=1,A∈Rm×d.

(5)

As with single-view clustering, in most cases, connecting samples to prototypes belonging to the same cluster will result in a connected graph. In order to make the partition of samples more reasonable, considering that the matrixSshould havekconnected components, MKMM introduces a new constraintS∈Ω to the objective function (5) to form a new objective, that is

s.t.S≥0,S1=1,S∈Ω,A∈Rm×d.

(6)

In order to get the best view weight parameters distribution, the view may initially be assigned a weight average. When the partition between the prototype and the samples is updated again, the view weights will be updated and changed accordingly to obtain the best weight. Eq.(6) should be turned as

s.t.S≥0,S1=1,S∈Ω,A∈Rm×d,

W≥0,W1=1.

(7)

1.2 Optimization strategy

A∈Rm×d,W≥0,W1=1.

(8)

In order to solve the problem (8), relax the restriction in problem (8) and turn the problem (8) into

F∈R(n+m)×k,FTF=I,W≥0,W1=1.

(9)

Eq.(9) can be solved by updateA,S,F,Witeratively. FixAfirstly, then updateS,F,W. Problem (9) can be turned into

F∈R(n+m)×k,FTF=I,W≥0,W1=1.

(10)

For solving problem (10), fixS,Wand updateF.

WhenFis fixed, fixW, updateSand change the problem (10) as

(11)

(12)

s.t.S≥0,S1=1.

(13)

(14)

The solution of problem (14) is similar to that of problem (4).

WhenS,Ffixed, it needs to updateW, which is used to solve the problem (14), expressed as

s.t.W≥0,W1=1.

(15)

(16)

DenoteY∈Rnv×nv, problem (16) can be regarded as

(17)

Removing the constraintW≥0, the Lagrange function of Eq.(17) can be expressed as

L(w,η)=WTYW-η(WT1-1),

(18)

(19)

(20)

1.3 MKMM algorithm steps

In summary, the main steps of the MKMM algorithm are described as:

Input: multi-view data setXnv={X(1),X(2),…,X(nv)}, number of clustersk, number of subclassm, parameterγ,λ.

Output: clustering resultsC={C1,C2,…,Ck}.

Step 1: calculate matrixS;

Step 3: solving problem (14) to update matrixS.

Step 4: update weight parameterWaccording to Eq.(20). Repeat steps 2, 3, 4 until converge.

Step 5: calculate the position of each prototype and update the matrixAuntil the position of the prototype no longer changes according to the matrixS;

Step 6: thekclusters are obtained according to the bipartite graph composed of samples and prototypes.

1.4 Computational complexity

In summary, supposing thatAis updatedt2times, and the total time complexity of the KMM algorithm is

O(((nmk+nmc+m2n+nv)t1+nmd)t2+nlogn+nd).

2 Experiments

2.1 Clustering metrics

The measure of clustering generally uses one or more evaluation indicators as judgment criteria to evaluate and analyze the clustering results, so as to determine the quality of the clustering algorithm. This article evaluates the algorithm throughAccuracy,NormalizedMutualInformation(NMI)[11]andPuritymetrics. The three metrics are introduced, respectively.

SupposingCis the real label of dataset,Wis the label obtained by the algorithm. The definition ofAccuracyis

(21)

where map is the best mapping function, which can transform the label obtained by the algorithm and the real label into one to one mapping relationship.δis the indicator function.

The definition ofNMIis

(22)

whereIrepresents mutual information;Hrepresents information entropy.

Purityis defined as

(23)

The larger the values of theAccuracy,NMIandPurityare, the better the clustering performance. The value range of the metrics is [0,1].

2.2 Datasets

The Caltech101-7 in Table1 is a dataset of 7 classes extracted from Caltech101. The Caltech101 is a dataset containing 101 classes of images created by the California University in 2003. Each class contains 40 to 80 pictures, each with assize of 300×200 pixels. The seven classes extracted by Caltech101-7 are human faces, motorcycles, dollar, garfield, snoopy, stop signs and Windsor chairs. Caltech101-7 extracted six features from the above seven classes images containing gabor, wavelet cenhist, hog, gist and lbp.

The Caltech101-20 in Table 1 is a dataset of 20 classes extracted from the Caltech101. The 20 classes are human face, leopard, motorcycle, binoculars, brain, camera, car sidewall, dollar, ferry, garfield, hedgehog, pagoda, rhino, snoopy, stapler, parking indicator, water lily, Windsor chair, wrench, and yinyang. The six features extracted from the images by Caltech101-20 are the same as Caltech101-20.

Table 1 Real benchmark dataset

The Yale32 in Table 1 is a face dataset of Yale University.There are 15 people in the dataset. Each person has 11 pictures with different poses, expressions and lighting. There are 165 pictures in total, and the pixels of the pictures are 32×32 pixels.

The Wikipedia Articles in Table 1 is some documents collected from the featured article in Wikipedia. This is a continuously growing data set. This article uses a dataset composed of 2 669 articles in 29 classes collected in October 2009. Because some of the classes are sparse, only 10 classes are retained, and dataset are pruned to retain 693 samples. The dataset has two sets of features, one set is taken from the text information in the document and the other set is taken from the text.

2.3 Competitors

In order to evaluate the MKMM algorithm proposed in this paper, some algorithms are selected.

Liu et al. proposed MKKM-MR[12]algorithm. This algorithm designed a novel, effective matrix-induced regularization to reduce such redundancy and enhance the diversity of the selected kernels. The algorithm needed to set the regularization parameters in advance when the algorithm was executed. According to the recommendations in the article, this article sets the parameter range from -15 to 20, sets the step size as 1 for comparison experiments, and takes the average of 26 results of algorithm.

Zhao et al. proposed SCMK1[13]algorithm. This algorithm learned similarity information from data and integrated three subtask of traditional spectral clustering into a unified framework. The parameters were tuned as suggested[13].

Zhao et al. proposed SCMK2[14]algorithm. This algorithm proposed a model to simultaneously learn cluster indicator matrix and similarity information in kernel spaces in a principles way.

Wang et al. proposed MVC_LFA[15]algorithm. This algorithm proposed to maximally align the consensus partition with the weighted base partitions. According to the recommendations, the balance parameters were set according to the values in set {2-15,2-14,…,215}, and the average value of 31 experiments was displayed.

2.4 Performance

Results of the MKMM algorithm and the competitors on the four multi-view datasets were evaluated byAccuracy,NMIandPuritymetric, which was shown in Tables 2-4.

Table 2 Accuracy values of five algorithms

Table 3 NMI values of five algorithms

Table 4 Purity values of five algorithms

In order to eliminate the influence of the initial prototypes selection of the MKMM algorithm, the average value of 30 experiments performed by the algorithm on the dataset is used for display. From the Tables 2-4, it can be seen that the MKMM algorithm has outstanding performance in some datasets compared with other popular multi-view clustering algorithms. Although the MKMM algorithm performs not the best on other datasets, it still achieves a good clustering effect. MKMM performs particularly well on Yale32, probably because the same person in different facial expressions can be taken as in distinct subclasses. MKMM may be more advantageous on similar datasets to Yale32.

3 Conclusions

Considering that the KMM algorithm cannot solve the problem of multi-view clustering, this paper proposes MKMM algorithm. The algorithm introduces view weight parameter, designs a new objective function and effectively uses multiple features to achieve better clustering results. However, affected by the initial point selection like the KMM algorithm, so that the clustering results of the MKMM algorithm are unstable. Therefore, how to make the MKMM algorithm select prototypes on the multi-view dataset scientifically and improve the clustering effect while stabilizing the clustering performance will be the next research work.