
























|
 |
|
Privacy-Preserving Data Mining
|
 |
Rakesh Agrawal and
Ramakrishnan Srikant
Return to Research Sessions
 |
|
Abstract
|
 |
A fruitful direction for future data mining research will be the development of techniques that incorporate privacy concerns. Specifically, we address the following question. Since the primary task in data mining is the development of models about aggregated data, can we develop accurate models without access to precise information in individual data records? We consider the concrete case of building a decision-tree classifier from training data in which the values of individual records have been perturbed. The resulting data records look very different from the original records and the distribution of data values is also very different from the original distribution. While it is not possible to accurately estimate original values in individual data records, we propose a novel reconstruction procedure to accurately estimate the distribution of original data values. By using these reconstructed distributions, we are able to build classifiers whose accuracy is comparable to the accuracy of classifiers built with the original data.
 |
|
References
|
 |
Note: References link to DBLP on the Web.
-
[AC99]
-
...
-
[AGI+92]
-
Rakesh Agrawal
,
Sakti P. Ghosh
,
Tomasz Imielinski
,
Balakrishna R. Iyer
,
Arun N. Swami
: An Interval Classifier for Database Mining Applications.
VLDB 1992
: 560-573
-
[Agr99]
-
Rakesh Agrawal
: Data Mining: Crossing the Chasm (Invited talk, Abstract only).
KDD 1999
: 2
-
[AW89]
-
Nabil R. Adam
,
John C. Wortmann
: Security-Control Methods for Statistical Databases: A Comparative Study.
ACM Computing Surveys 21(4)
: 515-556(1989)
-
[BDF+97]
-
Daniel Barbará
,
William DuMouchel
,
Christos Faloutsos
,
Peter J. Haas
,
Joseph M. Hellerstein
,
Yannis E. Ioannidis
,
H. V. Jagadish
,
Theodore Johnson
,
Raymond T. Ng
,
Viswanath Poosala
,
Kenneth A. Ross
,
Kenneth C. Sevcik
: The New Jersey Data Reduction Report.
Data Engineering Bulletin 20(4)
: 3-45(1997)
-
[Bec80]
-
Leland L. Beck
: A Security Mechanism for Statistical Databases.
TODS 5(3)
: 316-338(1980)
-
[Ben99]
-
Paola Benassi
: TRUSTe: An Online Privacy Seal Program.
CACM 42(2)
: 56-59(1999)
-
[BFOS84]
-
Leo Breiman
, J. H. Friedman, R. A. Olshen, C. J. Stone: Classification and Regression Trees. Wadsworth 1984, ISBN 0-534-98053-8
-
[BS97]
-
Daniel Barbará
,
Mark Sullivan
: Quasi-Cubes: Exploiting Approximations in Multidimensional Databases.
SIGMOD Record 26(3)
: 12-17(1997)
-
[CM96]
-
...
-
[CO82]
-
Francis Y. Chin
,
Gultekin Özsoyoglu
: Auditing and Inference Control in Statistical Databases.
TSE 8(6)
: 574-582(1982)
-
[Cox80]
-
...
-
[Cra46]
-
...
-
[CRA99a]
-
...
-
[Cra99b]
-
Lorrie Faith Cranor
: Internet Privacy - Introduction.
CACM 42(2)
: 28-31(1999)
-
[CS76]
-
...
-
[DDS79]
-
Dorothy E. Denning
,
Peter J. Denning
,
Mayer D. Schwartz
: The Tracker: A Threat to Statistical Database Security.
TODS 4(1)
: 76-96(1979)
-
[Den80]
-
Dorothy E. Denning
: Secure Statistical Databases with Random Sample Queries.
TODS 5(3)
: 291-315(1980)
-
[Den82]
-
Dorothy E. Denning
: Cryptography and Data Security. Addison-Wesley 1982
-
[Din78]
-
...
-
[DJL79]
-
David P. Dobkin
,
Anita K. Jones
,
Richard J. Lipton
: Secure Databases: Protection Against User Influence.
TODS 4(1)
: 97-106(1979)
-
[ECB99]
-
Vladimir Estivill-Castro
,
Ljiljana Brankovic
: Data Swapping: Balancing Privacy against Precision in Mining for Logic Rules.
DaWaK 1999
: 389-398
-
[Eco99]
-
...
-
[EHN96]
-
...
-
[eu998]
-
...
-
[Fel72]
-
...
-
[Fis63]
-
...
-
[FJS97]
-
Christos Faloutsos
,
H. V. Jagadish
,
Nikolaos Sidiropoulos
: Recovering Information from Summary Data.
VLDB 1997
: 36-45
-
[GWB97]
-
...
-
[HE98]
-
...
-
[HS99]
-
...
-
[LCL85]
-
Chong K. Liew
,
Uinam J. Choi
,
Chung J. Liew
: A Data Distortion by Probability Distribution.
TODS 10(3)
: 395-411(1985)
-
[LEW99]
-
Tessa A. Lau
,
Oren Etzioni
,
Daniel S. Weld
: Privacy Interfaces for Information Management.
CACM 42(10)
: 88-94(1999)
-
[LM99]
-
...
-
[LST83]
-
Ezio Lefons
,
Alberto Silvestri
,
Filippo Tangorra
: An Analytic Approach to Statistical Databases.
VLDB 1983
: 260-274
-
[MAR96]
-
Manish Mehta
,
Rakesh Agrawal
,
Jorma Rissanen
: SLIQ: A Fast Scalable Classifier for Data Mining.
EDBT 1996
: 18-32
-
[MST94]
-
Donald Michie
, D. J. Spiegelhalter, C. C. Taylor: Machine Learning, Neural and Statistical Classification. Ellis Horwood 1994, ISBN 0-13-106360-X
-
[Off98]
-
...
-
[Opp97]
-
Rolf Oppliger
: Internet Security: Firewalls and Beyond.
CACM 40(5)
: 92-102(1997)
-
[Qui93]
-
J. Ross Quinlan
: C4.5: Programs for Machine Learning.
Morgan Kaufmann
1993, ISBN 1-55860-238-0
-
[Rei84]
-
Steven P. Reiss
: Practical Data-Swapping: The First Steps.
TODS 9(1)
: 20-37(1984)
-
[RG98]
-
Aviel D. Rubin
,
Daniel E. Geer Jr.
: A Survey of Web Security.
IEEE Computer 31(9)
: 34-41(1998)
-
[SAM96]
-
John C. Shafer
,
Rakesh Agrawal
,
Manish Mehta
: SPRINT: A Scalable Parallel Classifier for Data Mining.
VLDB 1996
: 544-555
-
[Sho82]
-
Arie Shoshani
: Statistical Databases: Characteristics, Problems, and some Solutions.
VLDB 1982
: 208-222
-
[ST90]
-
Paul D. Stachour
,
Bhavani M. Thuraisingham
: Design of LDV: A Multilevel Secure Relational Database Management System.
TKDE 2(2)
: 190-209(1990)
-
[The98]
-
...
-
[Tim97]
-
...
-
[TYW84]
-
J. F. Traub
,
Yechiam Yemini
,
H. Wozniakowski
: The Statistical Security of a Statistical Database.
TODS 9(4)
: 672-679(1984)
-
[War65]
-
...
-
[Wes98a]
-
...
-
[Wes98b]
-
...
-
[Wes99]
-
...
-
[Wor]
-
...
-
[YC77]
-
Clement T. Yu
,
Francis Y. Chin
: A Study on the Protection of Statistical Data Bases.
SIGMOD Conference 1977
: 169-181
 |
|
BIBTEX
|
 |
@inproceedings{DBLP:conf/sigmod/AgrawalS00,
author = {Rakesh Agrawal and
Ramakrishnan Srikant},
editor = {Weidong Chen and
Jeffrey F. Naughton and
Philip A. Bernstein},
title = {Privacy-Preserving Data Mining},
booktitle = {Proceedings of the 2000 ACM SIGMOD International Conference on
Management of Data, May 16-18, 2000, Dallas, Texas, USA},
journal = {SIGMOD Record},
publisher = {ACM},
volume = {29},
number = {2},
year = {2000},
isbn = {1-58113-218-2},
pages = {439-450},
crossref = {DBLP:conf/sigmod/2000},
bibsource = {DBLP, http://dblp.uni-trier.de} } },
DiSC'01 Copyright ©2002 ACM Inc.
|