![]() ![]() ![]() |
![]() |
|
|
![]() ![]() ![]() ![]() ![]() |
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Return to Metadata & Sampling (Session C6) Querying large numbers of data sources is gain- ing importance due to increasing numbers of in- dependent data providers. One of the key chal- lenges is executing queries on all relevant infor- mation sources in a scalable fashion and retriev- ing fresh results. The key to scalability is to send queries only to the relevant servers and avoid wasting resources on data sources which will not provide any results. Thus, a catalog service, which would determine the relevant data sources given a query, is an essential component in effi- ciently processing queries in a distributed envi- ronment. This paper proposes a catalog frame- work which is distributed across the data sources themselves and does not require any central in- frastructure. As new data sources become avail- able, they automatically become part of the cata- log service infrastructure, which allows scalabil- ity to large numbers of nodes. Furthermore, we propose techniques for workload adaptability. Using simulation and real-world data we show that our approach is valid and can scale to thou- sands of data sources. ![]() ©2004 Association for Computing Machinery |