CERN IT/ASD & RD45
+41 22 767 4937
this paper describes the use of Object-Database Management Systems (ODBMS) for the storage of .
Object-Databases, Very Large Databases, High Energy Physics
The new experiments at the Large Hadron Collider (LHC) at CERN will gather an unprecedented amount of data. Starting from 2005 each of the four LHC experiments ALICE, ATLAS, CMS and LHCb will measure of the order of 1 Peta Byte (1015 Bytes) per year. All together the experiments will store and repeatedly analyse some 100 PB of data during their lifetimes. Such an enormous task can only be accomplished by large international collaborations. Thousands of physicists from hundreds of institutes world-wide will participate. This also implies that nearly any available hardware platform will be used resulting in a truly heterogeneous and distributed system.
The computing technical proposals of LHC experiments do not only require access to the data from remote sites but in addition ask for distribution of the data store itself to several regional centers.
1 PB/month (1 month per year)
Table 1: Expected Data Rates and Volumes at LHC
Data Models in High Energy Physics (HEP) are typically very complex. The number of different data types needed to describe the event data of a large HEP experiment reaches several hundreds. A single measurement (event) contains up to millions of interrelated objects.
Since the object model of is shared between multiple subsystems (e.g. data acquisition, event reconstruction and physics analysis) with very different access patterns, it is often difficult to fulfil all flexibility and performance requirements in a single design.
All LHC experiments exploit Object Oriented (OO) technology to implement and maintain their very large software systems. Today most software development is done in C++ with a growing interest in Java. The data store therefore has to support the main concepts of these OO languages such as abstraction, inheritance, polymorphism and parameterised types.
HEP data stores based on Object Database Management Systems (ODBMS) provide a number of important advantages in comparison to traditional systems. This approach provides the user with in a coherent logical view of complex HEP object models and allows a tight integration with multiple of today’s OO languages such as C++ and JAVA.
The clear separation of logical and physical data model introduced by object databases allows for transparent support of physical clustering and re-clustering which is expected to be an important tool to optimise the overall system performance.
The ODBMS implementation of Objectivity/DB in particular show scaling up to multi-PB distributed data stores and provides a seamless integration with Mass Storage Systems (MSS) like HPSS. Already today a significant number of HEP experiments in or close to production adopted an ODBMS based approach