Welcome to D
SIGMOD'00
 = SIGMOD'00 We
 = Plenary Talk
<<< = SIGMOD'00 Pa>>>
PODS'00
SIGMOD Recor
CIKM 2000/CI
COMAD 2000
Data Enginee
DL 2000
DPDJ
EDBT 2000
Hypertext 20
ICDE 2000
KDD 2000
KDD Explorat
KRDB 2000
SBBD 2000
SIGIR 2000
SIGIR Forum
SSDBM 2000
TODS
VLDB'00
VLDBJ

Designing and Mining Multi-Terabyte Astronomy Archives: The Sloan Digital Sky Survey


Alexandar Szalay, Peter Z. Kunszt, Ani Thakar, Jim Gray, and Donald R. Slutz

  View Paper (PDF)  

Return to Research Sessions


Abstract

The next-generation astronomy digital archives will cover most of the sky at fine resolution in many wavelengths, from X-rays, through ultraviolet, optical, and infrared. The archives will be stored at diverse geographical locations. One of the first of these projects, the Sloan Digital Sky Survey (SDSS) is creating a 5-wavelength catalog over 10,000 square degrees of the sky (see http://www.sdss.org/). The 200 million objects in the multi-terabyte database will have mostly numerical attributes in a 100+ dimensional space. Points in this space have highly correlated distributions.

The archive will enable astronomers to explore the data interactively. Data access will be aided by multidimensional spatial and attribute indices. The data will be partitioned in many ways. Small tag objects consisting of the most popular attributes will accelerate frequent searches. Splitting the data among multiple servers will allow parallel, scalable I/O and parallel data analysis. Hashing techniques will allow efficient clustering, and pair-wise comparison algorithms that should parallelize nicely. Randomly sampled subsets will allow debugging otherwise large queries at the desktop. Central servers will operate a data pump to support sweep searches touching most of the data. The anticipated queries will require special operators related to angular distances and complex similarity tests of object properties, like shapes, colors, velocity vectors, or temporal behaviors. These issues pose interesting data management challenges.


References


Note: References link to DBLP on the Web.

[1]
Swarup Acharya , Rafael Alonso , Michael J. Franklin , Stanley B. Zdonik : Broadcast Disks: Data Management for Asymmetric Communications Environments. SIGMOD Conference 1995 : 199-210
[2]
...
[3]
Remzi H. Arpaci-Dusseau , Eric Anderson , Noah Treuhaft , David E. Culler , Joseph M. Hellerstein , David A. Patterson , Katherine A. Yelick : Cluster I/O with River: Making the Fast Case Common. IOPADS 1999 : 10-22
[4]
Tom Barclay , Robert Barnes , Jim Gray , Prakash Sundaresan : Loading Databases Using Dataflow Parallelism. SIGMOD Record 23(4) : 72-83(1994)
[5]
David J. DeWitt , Jim Gray : Parallel Database Systems: The Future of High Performance Database Systems. CACM 35(6) : 85-98(1992)
[6]
...
[7]
...
[8]
...
[9]
Goetz Graefe : Query Evaluation Techniques for Large Databases. ACM Computing Surveys 25(2) : 73-170(1993)
[10]
...
[11]
...
[12]
Hanan Samet : The Design and Analysis of Spatial Data Structures. Addison-Wesley 1990
[13]
...
[14]
Michael Stonebraker , James Frew , Kenn Gardels , Jeff Meredith : The Sequoia 2000 Benchmark. SIGMOD Conference 1993 : 2-11
[15]
...
[16]
...
[17]
...
[18]
Stanley B. Zdonik , David Maier (Eds.): Readings in Object-Oriented Database Systems. Morgan Kaufmann 1990, ISBN 1-55860-000-0

BIBTEX


@inproceedings{DBLP:conf/sigmod/SzalayGKT00,
  author    = {Alexandar Szalay and
                Peter Z. Kunszt and
                Ani Thakar and
                Jim Gray and
                Donald R. Slutz},
   editor    = {Weidong Chen and
                Jeffrey F. Naughton and
                Philip A. Bernstein},
   title     = {Designing and Mining Multi-Terabyte Astronomy Archives: The Sloan
                Digital Sky Survey},
   booktitle = {Proceedings of the 2000 ACM SIGMOD International Conference on
                Management of Data, May 16-18, 2000, Dallas, Texas, USA},
   journal   = {SIGMOD Record},
   publisher = {ACM},
   volume    = {29},
   number    = {2},
   year      = {2000},
   isbn      = {1-58113-218-2},
   pages     = {451-462},
   crossref  = {DBLP:conf/sigmod/2000},
   bibsource = {DBLP, http://dblp.uni-trier.de} } },




DiSC'01 Copyright ©2002 ACM Inc.