Re-identification Attack to Privacy-Preserving Data Analysis with Noisy Sample-Mean
Aug 13, 20203 views
In mining sensitive databases, access to sensitive class attributes,of individual records is often prohibited by enforcing field-level,security, while only aggregate class-specific statistics are allowed,to be released. We consider a common privacy-preserving data,analytics scenario where only a noisy sample mean of the class of,interest can be queried. Such practice is widely found in medical,research and business analytics settings.,This paper studies the hazard of re-identification of entire class,caused by revealing a noisy sample mean of the class. With a novel,formulation of the re-identification attack as a generalized positiveunlabeled learning problem, we prove that the risk function of,the re-identification problem is closely related to that of learning,with complete data. We demonstrate that with a one-sided noisy,sample mean, an effective re-identification attack can be devised,with existing PU learning algorithms. We then propose a novel,algorithm, growPU, that exploits the unique property of sample,mean and consistently outperforms existing PU learning algorithms,on the re-identification task. GrowPU achieves re-identification,accuracy of,93,.,6%,on the MNIST dataset and,88,.,1%,on an online,behavioral dataset with noiseless sample mean. With noise that,guarantees,0,.,01,-differential privacy, growPU achieves,91,.,9%,on the,MNIST dataset and,84,.,6%,on the online behavioral dataset.