Welcome to D
SIGMOD 2005
PODS 2005
SIGMOD-RECOR
CIDR 2005
CIKM 2005
COMAD 2005
CVDB 2005
DaMoN 2005
Data Enginee
DEBS05
DMSN 2005
DOLAP 2005
GIR 2005
GIS 2005
Hypertext 20
ICDE 2005
<<< = ICDE'05 Pape>>>
ICDM 2005
IHIS 2005
IQIS 2005
JCDL 2005
KRAS 2005
MDM 2005
MIR 2005
MobiDE 2005
P2PIR 2005
RIDE 2005
SBBD 2005
SIGIR 2005
SIGIR-FORUM
SIGKDD 2005
SIGKDD-EXP
SSDBM 2005
TIME 2005
TKDE 2005
TODS 2005
VLDB 2005
VLDBJ 2005
WebDB 2005
WIDM 2005

Data privacy through optimal k-anonymization


Roberto J. Bayardo Jr. and Rakesh Agrawal

  View Paper (PDF)  

Return to Research Session 6: Privacy-Preserving Data Management


Abstract

Data de-identification reconciles the demand for release of data for research purposes and the demand for privacy from individuals. This paper proposes and evaluates an optimization algorithm for the powerful de-identification procedure known as k-anonymization. A k-anonymized dataset has the property that each record is indistinguishable from at least k - 1 others. Even simple restrictions of optimized k-anonymity are NP-hard, leading to significant computational challenges. We present a new approach to exploring the space of possible anonymizations that tames the combinatorics of the problem, and develop data-management strategies to reduce reliance on expensive operations such as sorting. Through experiments on real census data, we show the resulting algorithm can find optimal k-anonymizations under two representative cost measures and a wide range of k. We also show that the algorithm can produce good anonymizations in circumstances where the input data or input parameters preclude finding an optimal solution in reasonable time. Finally, we use the algorithm to explore the effects of different coding approaches and problem variations on anonymization quality and performance. To our knowledge, this is the first result demonstrating optimal k-anonymization of a non-trivial dataset under a general model of the problem.


©2006 Association for Computing Machinery