This work was supported by National Key Research and Development Program of China (Grant No. 2016YFC1000307) and National Natural Science Foundation of China (Grant Nos. 61971021, 61571024).
[1] Mendes R, Vilela J P. Privacy-Preserving Data Mining: Methods, Metrics, and Applications. IEEE Access, 2017, 5: 10562-10582 CrossRef Google Scholar
[2] Yu S. Big Privacy: Challenges and Opportunities of Privacy Study in the Age of Big Data. IEEE Access, 2016, 4: 2751-2763 CrossRef Google Scholar
[3] Dinur I, Nissim K. Revealing information while preserving privacy. In: Proceedings of ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, San Diego, 2003. 202--210. Google Scholar
[4] Dwork C, Nissim K. Privacy-preserving datamining on vertically partitioned databases. In: Advances in Cryptology -- CRYPTO 2004. Berlin: Springer, 2004. 528--544. Google Scholar
[5] Blum A, Dwork C, McSherry F, et al. Practical privacy: the sulq framework. In: Proceedings of ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, Baltimore, 2005. 128--138. Google Scholar
[6] Dwork C, McSherry F, Nissim K, et al. Calibrating noise to sensitivity in private data analysis. In: Theory of Cryptography. Berlin: Springer, 2006. 265--284. Google Scholar
[7] Dwork C. Differential privacy. In: Automata, Languages, and Programming. Berlin: Springer, 2006. 1--12. Google Scholar
[8] Kifer D, Machanavajjhala A. No free lunch in data privacy. In: Proceedings of ACM SIGMOD International Conference on Management of Data, Athens, 2011. 193--204. Google Scholar
[9] Cormode J. Personal privacy vs population privacy: learning to attack anonymization. In: Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, 2011. 1253--1261. Google Scholar
[10] Lee J, Clifton C. Differential identifiability. In: Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Beijing, 2012. 1041--1049. Google Scholar
[11] Dwork C. A firm foundation for private data analysis. Commun ACM, 2011, 54: 86-95 CrossRef Google Scholar
[12] Li L, Dong Y, Wang J. Differential privacy data protection method based on clustering. In: Proceedings of Cyber-Enabled Distributed Computing and Knowledge Discovery, Nanjing, 2017. 11--16. Google Scholar
[13] Zhao Z, Shang T, Liu J W, et al. Clustering algorithm for privacy preservation on MapReduce. In: Cloud Computing and Security, Cham, 2018. 622--632. Google Scholar
[14] Bugata P, Drotar P. On some aspects of minimum redundancy maximum relevance feature selection. Sci China Inf Sci, 2020, 63: 112103 CrossRef Google Scholar
[15] Samarati P, Sweeney L. Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression. In: Proceedings of IEEE Symposium on Security and Privacy, Oakland, 1998. 1--19. Google Scholar
[16] Samarati P. Protecting respondents identities in microdata release. IEEE Trans Knowl Data Eng, 2001, 13: 1010-1027 CrossRef Google Scholar
[17] Sweeney L. k-ANONYMITY: A MODEL FOR PROTECTING PRIVACY. Int J Unc Fuzz Knowl Based Syst, 2002, 10: 557-570 CrossRef Google Scholar
[18] Machanavajjhala A, Gehrke J, Kifer D, et al. l-diversity: privacy beyond k-anonymity. In: Proceedings of International Conference on Data Engineering, Atlanta, 2006. 24--35. Google Scholar
[19] Li N, Li T, Venkatasubramanian S. t-closeness: privacy beyond k-anonymity and l-diversity. In: Proceedings of International Conference on Data Engineering, Istanbul, 2007. 106--115. Google Scholar
[20] Ninghui Li , Tiancheng Li , Venkatasubramanian S. Closeness: A New Privacy Measure for Data Publishing. IEEE Trans Knowl Data Eng, 2010, 22: 943-956 CrossRef Google Scholar
[21] Li N H, Qardaji W, Su D, et al. Membership privacy: a unifying framework for privacy definitions. In: Proceedings of ACM SIGSAC Conference on Computer and Communications Security, New York, 2013. 889--900. Google Scholar
[22] Backes M, Berrang P, Humbert M, et al. Membership privacy in microRNA-based studies. In: Proceedings of ACM SIGSAC Conference on Computer and Communication, Vienna, 2016. 319--330. Google Scholar
[23] Aggarwal G, Feder T, Kenthapadi K, et al. Achieving anonymity via clustering. In: Proceedings of ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, Chicago, 2006. 153--162. Google Scholar
[24] Byun J W, Kamra A, Bertino E, et al. Efficient k-anonymization using clustering techniques. In: Proceedings of International Conference on Database Systems for Advanced Applications, Bangkok, 2007. 188--200. Google Scholar
[25] Nissim K, Raskhodnikova S, Smith A. Smooth sensitivity and sampling in private data analysis. In: Proceedings of ACM Symposium on Theory of Computing, San Diego, 2007. 75--84. Google Scholar
[26] Jafer Y, Matwin S, Sokolova M. Using Feature Selection to Improve the Utility of Differentially Private Data Publishing. Procedia Comput Sci, 2014, 37: 511-516 CrossRef Google Scholar
[27] Huang Z. Clustering large data sets with mixed numeric and categorical values. In: Proceedings of Pacific-Asia Conference on Knowledge Discovery and Data Mining, Singapore, 1997. 21--34. Google Scholar
[28] Huang Y. Understanding of Big Data: Big Data Processing and Programming Practices. Beijing: Machinery Industry Press, 2014. Google Scholar
[29] McSherry F, Talwar K. Mechanism design via differential privacy. In: Proceedings of IEEE Annual Symposium on Foundations of Computer Science, Providence, 2007. 94--103. Google Scholar
[30] Dwork C, Rothblum G N, Vadhan S. Boosting and differential privacy. In: Proceedings of IEEE Annual Symposium on Foundations of Computer Science, Las Vegas, 2010. 51--60. Google Scholar
[31] Bkakria A, Cuppens-Boulahia N, Cuppens F. Linking differential identifiability with differential privacy. In: Proceedings of International Conference on Information and Communications Security, Cham, 2018. 232--247. Google Scholar
[32] McSherry F. Privacy integrated queries:an extensible platform for privacy-preserving data analysis. Commun ACM, 2010, 53(9): 89--97 doi: 10.1145/1559845.1559850. Google Scholar
[33] Xiong J, Ren J, Chen L. Enhancing Privacy and Availability for Data Clustering in Intelligent Electrical Service of IoT. IEEE Internet Things J, 2019, 6: 1530-1540 CrossRef Google Scholar
[34] A differential privacy protecting K-means clustering algorithm based on contour coefficients. PLoS ONE, 2018, 13: e0206832 CrossRef ADS Google Scholar
[35] Gao Z, Sun Y, Cui X, et aL. Privacy-preserving hybrid $k$-means. Int. J. Data Warehous. Min., 2018, 14(2): 1--17 doi: 10.4018/978-1-5225-7113-1.ch049. Google Scholar
[36] Ren J, Xiong J, Cui X, et al. Dplk-means: A novel differential privacy $k$-means mechanism. In: Proceedings of IEEE International Conference on Data Science in Cyberspace, Shenzhen, 2017. 133--139. Google Scholar
[37] Su D, Cao J, Li N. Differentially Private K-Means Clustering and a Hybrid Approach to Private Optimization. ACM Trans Priv Secur, 2017, 20: 1-33 CrossRef Google Scholar
[38] Author A, Author B, Author C. Reference title. Journal, Year, Vol: Number or pages. Google Scholar
[39] Author A, Author B, Author C, et al. Reference title. In: Proceedings of Conference, Place, Year. Number or pages. Google Scholar
Figure 1
MapReduce architecture.
Figure 6
(Color online) F-measure: DI $k$-prototypes algoriTheorem.
Notation | Meaning |
$U$ | Universe |
$D~\subset~U$ | Database to be queried |
$D'=D-i\*$ | Subset of $D$ missing one individual |
$t~\in~U$ | Data associated with an individual |
$I(t)$ | Identity of an individual corresponding to $t$ |
$\mathcal{I}_D=\{I(t)|t\in~D\}$ | Set of individuals which belong to $D$ |
$\Psi$ | All possible databases |
$f$ | Query function |
$\Delta~f$ | Sensitive range of $f$ |
$sf$ | Score function |
$M$ | Randomized mechanism |
$\Gamma(t)$ | Identifiable risk for an individual $t$ |
Analysis of DP and DI $k$-means algorithms$^{\rm~a)}$
Data set characteristic | Multivariate |
Number of instances | 19020 |
Area | Physical |
Attribute characteristics | Real |
Number of attributes | 11 |
Classification | $g=$ gamma(signal): 12332, $h=$ hadron(background): 6688 |
Attribute | Characteristic | Information | |||||||
Age | Numerical | – | |||||||
Sex | Categorical | 1: male; 2: female | |||||||
Chest pain type | Categorical |
| |||||||
Resting blood pressure | Numerical | – | |||||||
Serum | Numerical | mg/dl | |||||||
Fasting blood Sugar | Categorical |
| |||||||
| Categorical |
| |||||||
| Numerical | – | |||||||
Exercise induced angina | Categorical | 0: no; 1: yes | |||||||
| Numerical | – | |||||||
| Categorical |
| |||||||
Number of major vessels | Numerical | ||||||||
Thalassemia | Categorical |
|