国家重点研发计划(2018YFB1004300)
山西省重点研发计划(201903D421003)
国家自然科学基金(61672332,U1805263,61872226,61802238,61673249)
山西省海外归国人员研究项目(2017023,2016004)
[1] LeCun Y, Bengio Y, Hinton G. Deep learning. Nature, 2015, 521: 436-444 CrossRef PubMed ADS Google Scholar
[2] Jain A K, Murty M N, Flynn P J. Data clustering: a review. ACM Comput Surv, 1999, 31: 264-323 CrossRef Google Scholar
[3] Zheng L, Yang Y, Tian Q. SIFT Meets CNN: A Decade Survey of Instance Retrieval.. IEEE Trans Pattern Anal Mach Intell, 2018, 40: 1224-1244 CrossRef PubMed Google Scholar
[4] Dhillon I S, Modha D S. Concept decompositions for large sparse text data using clustering. Machine Learning, 2001, 42: 143-175 CrossRef Google Scholar
[5] Lu D, Tripodis Y, Gerstenfeld L C, et al. Clustering of temporal gene expression data with mixtures of mixed effects models with a penalized likelihood. Bioinformatics, 2018, 35: 778-786. Google Scholar
[6] Zechao Li , Jing Liu , Yi Yang . Clustering-Guided Sparse Structural Learning for Unsupervised Feature Selection. IEEE Trans Knowl Data Eng, 2014, 26: 2138-2150 CrossRef Google Scholar
[7] Erhan D, Bengio Y, Courville A, et al. Why does unsupervised pre-training help deep learning? J Mach Learn Res, 2010, 11: 625-660 DOI: 10.1145/1756006.1756025. Google Scholar
[8] Frey B J, Dueck D. Clustering by Passing Messages Between Data Points. Science, 2007, 315: 972-976 CrossRef PubMed ADS Google Scholar
[9] Rodriguez A, Laio A. Clustering by fast search and find of density peaks. Science, 2014, 344: 1492-1496 CrossRef PubMed ADS Google Scholar
[10] Otto C, Wang D, Jain A K. Clustering Millions of Faces by Identity.. IEEE Trans Pattern Anal Mach Intell, 2018, 40: 289-303 CrossRef PubMed Google Scholar
[11] Dan Zhang , Fei Wang , Luo Si . Maximum margin multiple instance clustering with applications to image and text clustering.. IEEE Trans Neural Netw, 2011, 22: 739-751 CrossRef PubMed Google Scholar
[12] Brockmeier A J, Mu T, Ananiadou S. Self-Tuned Descriptive Document Clustering Using a Predictive Network. IEEE Trans Knowl Data Eng, 2018, 30: 1929-1942 CrossRef Google Scholar
[13] Qian Y, Liang J, Pedrycz W. Positive approximation: An accelerator for attribute reduction in rough set theory. Artificial Intelligence, 2010, 174: 597-618 CrossRef Google Scholar
[14] Qian Y, Li F, Liang J. Space Structure and Clustering of Categorical Data.. IEEE Trans Neural Netw Learning Syst, 2016, 27: 2047-2059 CrossRef PubMed Google Scholar
[15] Fahy C, Yang S, Gongora M. Ant Colony Stream Clustering: A Fast Density Clustering Algorithm for Dynamic Data Streams.. IEEE Trans Cybern, 2019, 49: 2215-2228 CrossRef PubMed Google Scholar
[16] Blomstedt P, Tang J, Xiong J. A Bayesian Predictive Model for Clustering Data of Mixed Discrete and Continuous Type.. IEEE Trans Pattern Anal Mach Intell, 2015, 37: 489-498 CrossRef PubMed Google Scholar
[17] Douzas G, Bacao F. Self-Organizing Map Oversampling (SOMO) for imbalanced data set learning. Expert Syst Appl, 2017, 82: 40-52 CrossRef Google Scholar
[18] Xu J, Han J, Nie F. Re-Weighted Discriminatively Embedded K -Means for Multi-View Clustering. IEEE Trans Image Process, 2017, 26: 3016-3027 CrossRef PubMed ADS Google Scholar
[19] Yao X, Han J, Zhang D. Revisiting Co-Saliency Detection: A Novel Approach Based on Two-Stage Multi-View Spectral Rotation Co-clustering. IEEE Trans Image Process, 2017, 26: 3196-3209 CrossRef PubMed ADS Google Scholar
[20] Li F, Qian Y, Wang J. Clustering ensemble based on sample's stability. Artificial Intelligence, 2019, 273: 37-55 CrossRef Google Scholar
[21] Strehl A L, Ghosh J. Cluster ensembles---a knowledge reuse framework for combining multiple partitions. Journal ofMachine Learning Research, 2003, 3: 583-617. Google Scholar
[22] Li F, Qian Y, Wang J. Multigranulation information fusion: A Dempster-Shafer evidence theory-based clustering ensemble method. Inf Sci, 2017, 378: 389-409 CrossRef Google Scholar
[23] Shannon C E. A Mathematical Theory of Communication. Bell Syst Technical J, 1948, 27: 379-423 CrossRef Google Scholar
[24] Hu Q H, Guo M Z, Yu D R. Information entropy for ordinal classification. Sci China Inf Sci, 2010, 53: 1188-1200 CrossRef Google Scholar
[25] Otsu N. A Threshold Selection Method from Gray-Level Histograms. IEEE Trans Syst Man Cybern, 1979, 9: 62-66 CrossRef Google Scholar
[26] Chan T F, Vese L A. Active contours without edges. IEEE Trans Image Process, 2001, 10: 266-277 CrossRef PubMed ADS Google Scholar
[27] Malinen M I, Mariescu-Istodor R, Fr?nti P. K-means?: Clustering by gradual data transformation. Pattern Recognition, 2014, 47: 3376-3386 CrossRef Google Scholar
[28] Likas A, Vlassis N, J. Verbeek J. The global k-means clustering algorithm. Pattern Recognition, 2003, 36: 451-461 CrossRef Google Scholar
[29] Fan Z, Jiang J, Weng S. Adaptive density distribution inspired affinity propagation clustering. Neural Comput Applic, 2019, 31: 435-445 CrossRef Google Scholar
[30] Xu X, Ding S, Shi Z. An improved density peaks clustering algorithm with fast finding cluster centers. Knowledge-Based Syst, 2018, 158: 65-74 CrossRef Google Scholar
[31] Averbuch-Elor H, Bar N, Cohen-Or D. Border-peeling clustering. IEEE Trans Pattern Anal Mach Intell, 2019. doi: 10.1109/TPAMI.2019.2924953. Google Scholar
[32] Hubert L, Arabie P. Comparing partitions. J Classification, 1985, 2: 193-218 CrossRef Google Scholar
Figure 1
An example with $\mu=3$ and $\sigma=3$. (a) Conditional probability; (b) posterior probability
Figure 2
The cuver of (a) $H_2$ and (b) ${\rm~fe}$ ($t=0.7$)
Figure 3
Experiments on synthetic data sets. (a) 2d2k; (b) Flame; (c) Jain; (d) WingNut
Figure 4
Experiments on image segmentation data sets
Figure 5
(Color online) Nemenyi post-hoc test based on (a) Table
Number | Data set | Number of samples | Number of attributes | Number of clusters |
1 | Iris | 150 | 4 | 3 |
2 | Wine recognition data | 178 | 13 | 3 |
3 | Seeds data set | 210 | 7 | 3 |
4 | Glass identification database | 214 | 9 | 6 |
5 | Ecoli | 336 | 7 | 8 |
6 | Cardiotocography data set | 2126 | 40 | 10 |
7 | Image segmentation data | 2310 | 19 | 7 |
8 | Waveform database generator | 5000 | 21 | 3 |
9 | tr45 | 690 | 8261 | 10 |
10 | tr41 | 878 | 7454 | 10 |
11 | wap | 1560 | 8460 | 20 |
12 | re1 | 1657 | 3758 | 25 |
基于式( |
根据式( |
根据式( |
利用AP算法对稳定样本集${\rm~SS}$聚类:$C_{ss}=\{c_1^{\rm ss},c_2^{\rm ss},\ldots,c_{k'}^{\rm ss}\}\leftarrow {\rm AP}({\rm SS})$; |
根据式( |
更新稳定样本集和不稳定样本集: ${\rm~SS}={\rm~SS}\cup~{\rm~NS}^>$, ${\rm~NS}={\rm~NS}-{\rm~NS}^>$; |
根据式( |
根据式( |
$C\leftarrow~C_{{\rm~SS}}$. |
$C^d$ | $C^b$ | ||||
$c_1^b$ | $c_2^b$ | $\cdots$ | $c_k^b$ | SUM | |
$c_1^d$ | $n_{11}$ | $n_{12}$ | $\cdots$ | $n_{1k}$ | $n_{1\cdot}$ |
$c_2^d$ | $n_{21}$ | $n_{22}$ | $\cdots$ | $n_{2k}$ | $n_{2\cdot}$ |
$\vdots$ | $\vdots$ | $\vdots$ | $\ddots$ | $\vdots$ | $\vdots$ |
$c_k^d$ | $n_{k1}$ | $n_{k2}$ | $\cdots$ | $n_{kk}$ | $n_{k~\cdot}$ |
SUM | $n_{\cdot~1}$ | $n_{\cdot~2}$ | $\cdots$ | $n_{\cdot~k}$ | $n$ |
Number | K-means | K-means* | GK-means | AP | AD-AP | DP | GDPC | BPC | SSC |
1 | 0.7116$\pm$0.0622 | 0.7309 $\pm$0.0142 | 0.7419 | 0.7777 | 0.7777 | 0.6586 | 0.6586 | 0.7630 | |
2 | 0.8414$\pm$0.0118 | 0.8347 | 0.7507 | 0.7315 | 0.5885 | 0.7415 | 0.7955 | 0.8364 | |
3 | 0.6743$\pm$0.0000 | 0.6725$\pm$0.0037 | 0.6654 | 0.6873 | 0.6790 | 0.6797 | 0.5246 | 0.6959 | |
4 | 0.4107$\pm$0.0292 | 0.3409$\pm$0.0313 | 0.4008 | 0.3392 | 0.3065 | 0.3265 | 0.2757 | 0.3918 | |
5 | 0.5939$\pm$0.0283 | 0.5970$\pm$0.0185 | 0.6224 | 0.6311 | 0.5653 | 0.4985 | 0.4773 | 0.6333 | |
6 | 0.8972$\pm$0.0227 | 0.9177$\pm$0.0368 | 0.9478 | 0.9270 | 0.8118 | 0.8020 | 0.8199 | 0.9797 | |
7 | 0.6114$\pm$0.0159 | 0.6065$\pm$0.0112 | 0.6109 | 0.6116 | 0.6095 | 0.6599 | 0.6666 | 0.6664 | |
8 | 0.3642$\pm$0.0000 | 0.3642$\pm$0.0000 | 0.3642 | 0.2989 | 0.2760 | 0.2214 | 0.2128 | 0.3592 | |
9 | 0.4669$\pm$0.0406 | 0.2708$\pm$0.0244 | 0.4245 | 0.4062 | 0.3681 | 0.2673 | 0.3209 | 0.4491 | |
10 | 0.4829$\pm$0.0389 | 0.3160$\pm$0.0312 | 0.4292 | 0.4003 | 0.3917 | 0.3507 | 0.4268 | 0.4311 | |
11 | 0.4668$\pm$0.0163 | 0.4792 | 0.3603 | 0.3603 | 0.2728 | 0.2608 | 0.4523 | 0.4995 | |
12 | 0.4360$\pm$0.0170 | 0.3031$\pm$0.0148 | 0.4194 | 0.4464 | 0.4461 | 0.3628 | 0.3524 | 0.4192 | |
Average rank | 4.0000 | 5.9167 | 4.3333 | 4.6667 | 6.2500 | 7.5000 | 7.4167 | 3.5833 | 1.3333 |
Number | K-means | K-means* | GK-means | AP | AD-AP | DP | GDPC | BPC | SSC |
1 | 0.6589$\pm$0.1179 | 0.7102$\pm$0.0080 | 0.7163 | 0.7445 | 0.7445 | 0.4531 | 0.4531 | 0.7184 | |
2 | 0.8451$\pm$0.0140 | 0.8477$\pm$0.0196 | 0.8471 | 0.7262 | 0.7144 | 0.4956 | 0.7567 | 0.8025 | |
3 | 0.7049$\pm$0.0000 | 0.7026$\pm$0.0048 | 0.6934 | 0.7064 | 0.6952 | 0.7076 | 0.4726 | 0.7020 | |
4 | 0.1859$\pm$0.0278 | 0.2471 | 0.1862 | 0.1583 | 0.2267 | 0.2294 | 0.2397 | 0.2481 | |
5 | 0.4238$\pm$0.0811 | 0.4180$\pm$0.0334 | 0.4150 | 0.4551 | 0.3412 | 0.3086 | 0.2655 | 0.6142 | |
6 | 0.7779$\pm$0.0654 | 0.8049$\pm$0.0905 | 0.8521 | 0.8558 | 0.6529 | 0.6323 | 0.6372 | 0.9771 | |
7 | 0.4711 $\pm$0.0446 | 0.4736$\pm$0.0354 | 0.5034 | 0.5100 | 0.5117 | 0.5301 | 0.5159 | 0.5260 | |
8 | 0.2535$\pm$0.0000 | 0.2535$\pm$0.0000 | 0.2536 | 0.2167 | 0.1980 | 0.1897 | 0.1875 | 0.2542 | |
9 | 0.1855$\pm$0.0421 | 0.2512 | 0.2498 | 0.2217 | 0.0822 | 0.1275 | 0.2848 | 0.2354 | |
10 | 0.3024$\pm$0.0691 | 0.2698$\pm$0.0305 | 0.2493 | 0.2296 | 0.2349 | 0.1388 | 0.2725 | 0.2695 | |
11 | 0.2025$\pm$0.0857 | 0.1408$\pm$0.0390 | 0.1792 | 0.1126 | 0.1126 | 0.1756 | 0.1964 | 0.1157 | |
12 | 0.2451$\pm$0.0181 | 0.1224$\pm$0.0147 | 0.1598 | 0.2066 | 0.2178 | 0.1303 | 0.1660 | 0.2427 | |
Average rank | 3.9167 | 5.7500 | 4.9167 | 5.3333 | 6.4167 | 7.0000 | 6.2500 | 3.7500 | 1.6667 |