Welcome to DiSC 2002
SIGMOD 2001
PODS 2001
 = PODS'01 Website
 = SIGMOD/PODS'01 Plena
<<< = PODS'01 Papers>>>
 = Invited Talks
 = Award Talks
 = Tutorials
 SIGMOD RECORD 2001
CIKM 2001
CoopIS 2001
DASFAA 2001
DASFAA 2000
DBPL 2001
Data Engineering Bul
DEXA_EC-WEB 2001
DMKD 2001
 DPDJ 2001
HYPERTEXT 2001
ICDE 2001
ICDM 2001
ICDT 2001
JCDL 2001
KDD 2001
 KDD_EXPLORATIONS 20
KRDB 2001
MDM 2001
MIR 2001
MIS 2001
RIDE 2001
SBBD 2001
 SIGIR 2001
 SIGIR FORUM 2001
SSDBM 2001
SSTD 2001
TODS 2001
TIME 2001
VLDB 2001
VLDBJ 2001

On the Design and Quantification of Privacy Preserving Data Mining Algorithms


Dakshi Agrawal and Charu C. Aggarwal

  View Paper (PDF)  

Return to Data Mining


Abstract

The increasing ability to track and collect large amounts of data with the use of current hardware technology has lead to an interest in the development of data mining algorithms which preserve user privacy. A recently proposed technique addresses the issue of privacy preservation by perturbing the data and reconstructing distributions at an aggregate level in order to perform the mining. This method is able to retain privacy while accessing the information implicit in the original attributes. The distribution reconstruction process naturally leads to some loss of information which is acceptable in many practical situations. This paper discusses an Expectation Maximization (EM) algorithm for distribution reconstruction which is more effective than the currently available method in terms of the level of information loss. Specifically, we prove that the EM algorithm converges to the maximum likelihood estimate of the original distribution based on the perturbed data. We show that when alarge amount of data is available, the EM algorithm provides robust estimates of the original distribution. We propose metrics for quantification and measurement of privacy preserving data mining algorithms. Thus, this paper provides the foundations for measurement of the effectiveness of privacy preserving data mining algorithms. Our privacy metrics illustrate some interesting results on the relative effectiveness of different perturbing distributions.


DiSC'02 © 2003 Association for Computing Machinery