Welcome to D
SIGMOD 2005
PODS 2005
SIGMOD-RECOR
<<< = SIGMOD-RECOR>>>
CIDR 2005
CIKM 2005
COMAD 2005
CVDB 2005
DaMoN 2005
Data Enginee
DEBS05
DMSN 2005
DOLAP 2005
GIR 2005
GIS 2005
Hypertext 20
ICDE 2005
ICDM 2005
IHIS 2005
IQIS 2005
JCDL 2005
KRAS 2005
MDM 2005
MIR 2005
MobiDE 2005
P2PIR 2005
RIDE 2005
SBBD 2005
SIGIR 2005
SIGIR-FORUM
SIGKDD 2005
SIGKDD-EXP
SSDBM 2005
TIME 2005
TKDE 2005
TODS 2005
VLDB 2005
VLDBJ 2005
WebDB 2005
WIDM 2005

A Survey of Data Provenance in e-Science


Yogesh L. Simmhan, Beth Plale, and Dennis Gannon

  View Paper (PDF)  

Return to September 2005, Volume 34, Number 3


Abstract

Data management is growing in complexity as large-scale applications take advantage of the loosely coupled resources brought together by grid middleware and by abundant storage capacity. Metadata describing the data products used in and generated by these applications is essential to disambiguate the data and enable reuse. Data provenance, one kind of metadata, pertains to the derivation history of a data product starting from its original sources.In this paper we create a taxonomy of data provenance characteristics and apply it to current research efforts in e-science, focusing primarily on scientific workflow approaches. The main aspect of our taxonomy categorizes provenance systems based on why they record provenance, what they describe, how they represent and store provenance, and ways to disseminate it. The survey culminates with an identification of open research problems in the field.


©2006 Association for Computing Machinery