PODS '82- Proceedings of the 1st ACM SIGACT-SIGMOD symposium on Principles of database systems

Full Citation in the ACM Digital Library

SESSION: Session 1

Session details: Session 1

Assumptions in relational database theory

Many results in relational database theory on the structure of dependencies, query languages, and databases in general have now been established. However, neither (a) the reliance of these results on various assumptions, nor (b) the desirability or reasonableness of these assumptions themselves have been closely examined. These assumptions are nontrivial: examples include the universal relation assumption and the lossless join assumption.The purpose of the present paper is to clarify many of the existing assumptions, and point out weaknesses. This is desirable both to harden the statements of previous results, and to evaluate recent suggestions that certain assumptions (such as the acyclic JD assumption) may be useful for modeling "real world" databases. Specifically, studies are made of assumptions made for (1) universal relations, (2) functional dependency inference, and (3) decomposition theory. We show that:• Some assumptions (such as uniqueness of relationships among attributes) can be more powerful than they appear;• common treatment of FDs is sometimes inappropriate, and for example FD inferences such as {AB, BC} |= AC can be incorrect;• the 'decomposition' approach to design may be hard to justify in real terms; and• Acyclic JDs may have drawbacks in eliminating ambiguity in queries and in modeling real enterprises.It is hoped that this exposition will help clarify some confusing issues in this field, and will lead to a better understanding of which assumptions are reasonable and useful in modeling the "real world".

The U. R. strikes back

In this paper, we try to put to rest many of the objections to the universal relation concept that have appeared in the literature. First, we shall taxonomize the varieties of ideas that are sometimes called the "universal relation assumption." Then, we consider some of the arguments pro and con. In some cases, the arguments against were expressed prematurely, and solutions to the problems they expose have since been found. In other cases, the arguments against are simply fallacious. In still other cases, the problems pointed out are real, but simply serve to point out that the advantages of the universal relation are not gotten for free. We shall conclude the paper with a description of the algorithm used to interpret queries in System/U, and the reasoning behind it.

SESSION: Session 2

Session details: Session 2

On equivalences of database schemes

Database relations with null values

Connections in acyclic hypergraphs: extended abstract

We demonstrate a sense in which the equivalence between blocks (subgraphs without articulation points) and biconnected components (subgraphs in which there are two edge-disjoint paths between any pair of nodes) that holds in ordinary graph theory can be generalized to hypergraphs. The result has an interpretation for relational databases that the universal relations described by acyclic join dependencies are exactly those for which the connections among attributes are defined uniquely. We also exhibit a relationship between the process of Graham reduction [6] of hypergraphs and the process of tableau reduction [1] that holds only for acyclic hypergraphs.

The tree property is fundamental for query processing

One can partition the class of relational database schemas into tree schemas and cyclic schemas. In this paper we examine query processing implications of the partitioning; other areas impacted include dependency theory, schema design and graph theory.We consider a class of queries that compute the join of all relations in the database projected onto a prescribed set of attributes. We show that solving such queries (using the join, project and semijoin operators) is tantamount to creating an "embedded" tree schema which we call a tree projection. This lends further credibility to the pivotal nature of the tree/cyclic partitioning.Using the tree projection concept we analyze the problem of determining how many joins are needed to solve a query.

Transforming cyclic schemas into trees

Compatible attributes in a universal relation

The problem of entities that play multiple roles in a relational database is studied, with an emphasis on the universal relation view of the database. A tree structure is used to describe the compatibilities among attributes. The compatibility information for an attribute encodes role playing information that cannot be described by traditional data dependencies. This structure is used to define the semantics of a QUEL-like (Stonebraker et al. [1976]) query language that uses special naming rules to permit easy use of attribute compatibility

SESSION: Session 3

Session details: Session 3

Multilevel atomicity

Sacrificing serializability to attain high availability of data in an unreliable network

We present a simple algorithm for maintaining a replicated distributed dictionary which achieves high availability of data, rapid processing of atomic actions, efficient utilization of storage, and tolerance to node or network failures including lost or duplicated messages. It does not require transaction logs, synchronized clocks, or other complicated mechanisms for its operation. It achieves consistency contraints which are considerably weaker than serial consistency but nonetheless are adequate for many dictionary applications such as electronic appointment calendars and mail systems. The degree of consistency achieved depends on the particular history of operation of the system in a way that is intuitive and easily understood. The algorithm implements a "best effort" approximation to full serial consistency, relative to whatever internode communication has successfully taken place, so the semantics are fully specified even under partial failure of the system. Both the correctness of the algorithm and the utility of such weak semantics depend heavily on special properties of the dictionary operations.

On concurrency control by multiple versions

We examine the problem of concurrency control when the database management system supports multiple versions of the data. We characterize the limit of the parallelism achievable by the multiversion approach and demonstrate the resulting space-parallelism tradeoff.

Concurrency on high-traffic data elements

In many large database applications there are certain elements mostly containing aggregate information, which are very frequently referred to (read and modified) by many transactions. If access to such fields has to obey to conventional two-phase lock protocols (1,2), transactions will be serialized in front of these "hot spots", i.e. the degree of parallelism is reduced. To avoid this kind of lock contention some improved lock protocols have been proposed, the most interesting of which is the one implemented in IMS Fast Path (3,4), where add and subtract may be performed concurrently on numerical fields, since backout is always possible with the unique inverse of each operand. A similar scheme is proposed in (10). We expand this idea to parallel readers and writers on numerical data types, proving that under certain conditions the result of such concurrent operations is consistent in the sense that it is equal to some serial schedule (2,5).

SESSION: Session 4

Session details: Session 4

An algorithm for minimizing roll back cost

Is distributed locking harder?

We examine the problem of determining whether a set of locked transactions, accessing a distributed database, is guaranteed to produce only serializable schedules. For a pair of transactions we prove that this concurrency control problem (which is polynomially solvable for centralized databases) is in general coNP-complete. We employ a new graph-theoretic technique and provide an efficient test for the special case of databases distributed between two sites only.

An optimal algorithm for testing for safety and detecting deadlocks in locked transaction systems

Various notions are introduced for the closure of a set of rectilinearly oriented rectangles on the plane. Optimal algorithms are presented for computing the closures. These algorithms imply optimal solutions to several problems related to database concurrency control.

SESSION: Session 5

Session details: Session 5

An extended relational algebra with control over duplicate elimination

In the pure relational model, duplicate tuples are automatically eliminated. Some real world languages such as DAPLEX, however, give users control over duplicate elimination. This paper extends the relational model to include multiset relations, i.e., relations with duplicate tuples. It considers three formalisms for expressing queries in this model: extended relational algebra, tableaux, and DAPLEX. It shows that, as in the original algebra, the equivalence problem for conjunctive expressions in the extended algebra can be solved using tableaux, and is NP-complete. Finally, it demonstrates that the extended algebra and DAPLEX have essentially the same expressiveness relative to conjunctive expressions.

Remarks on the algebra of non first normal form relations

Usually, the first normal form condition of the relational model of data is imposed. Presently, a broader class of data base applications like office information systems is considered where this restriction is not convenient. Therefore, an extension of the relational model is proposed consisting of Non First Normal Form (NF2) relations. The relational algebra is enriched mainly by so called nest and unnest operations which transform between NF2 relations and the usual ones. We state some properties of these operations and some rules which occur in combination with the operations of the usual relational algebra. Since we propose to use the NF2 model also for the internal data model these rules are important not only for theoretical reasons but also for a practical implementation.

Non-reversible privacy transformations

A universal relation database system implemented via the network model

We describe a network schema design algorithm that uses a well-designed relational schema as input. The resulting network database handles incomplete information, since we use only the modified foreign-key constraint and not the universal instance assumption. We believe that the representative instance is a correct representation of information stored in the database. We show how to map the representative instance to the network database. We give two algorithms for translating a relational query to a network application program. The first algorithm is confined to relational queries that express selections and projections over the universal relation scheme. The second algorithm includes all relational queries over select, project and (natural) join. Optimization techniques are discussed for both types of translation.

SESSION: Session 6

Session details: Session 6

Horn clauses and the fixpoint query hierarchy

A logic program consists of a set of Horn clauses, and can be used to express a query on relational data bases. It is shown that logic programs express precisely the queries in YE+ (the set of queries representable by a fixpoint applied to a positive existential query). Queries expressible by logic programs are thus not first order queries in general; nor are all the first order queries expressible as logic programs. A way of adding the negation operator to logic programs is suggested. The resulting set of clausal queries equals FP, the set of first order queries closed under fixpoints (as well as ¬, ∨, 3).

Testing containment of conjunctive queries under functional and inclusion dependencies

We consider the problem of optimizing conjunctive queries in the presence of inclusion and functional dependencies. We show that the problem of containment (and hence those of equivalence and non-minimality) is in NP when either (a) there are no functional dependencies or (b) the set of dependencies is what we call key-based. These results assume that infinite databases are allowed. If only finite databases are allowed, new containments may arise, as we illustrate by an example. We also prove a "compactness" theorem that shows that no such examples can exist for case (b).

The relational model of data and cylindrical algebras

SESSION: Session 7

Session details: Session 7

Inclusion dependencies and their interaction with functional dependencies

Inclusion dependencies, or INDs (which can say, for example, that every manager is an employee) are studied, including their interaction with functional dependencies, or FDs. A simple complete axiomatization for INDs is presented, and the decision problem for INDs is shown to be PSPACE-complete. (The decision problem for INDs is the problem of determining whether or not Σ logically implies σ, given a set Σ of INDs and a single IND σ). It is shown that finite implication (implication over databases with a finite number of tuples) is the same as unrestricted implications for INDs, although finite implication and unrestricted implication are distinct for FDs and INDs taken together. It is shown that, although there are simple complete axiomatizations for FDs alone and for INDs alone, there is no complete axiomatization for FDs and INDs taken together, in which every rule is k-ary for some fixed k (and in particular, there is no finite complete axiomatization.) This is true whether we consider finite implication or unrestricted implication, and is true even if no relation scheme has more than three attributes. The nonexistence of a k-ary complete axiomatization for FDs and INDs taken together is proven by giving a condition which is necessary and sufficient in general for the existence of a k-ary complete axiomatization.

Notions of dependency satisfaction

Two notions of dependency satisfaction, consistency and completeness, are introduced. Consistency is the natural generalization of weak satisfaction and seems appropriate when only equality-generating dependencies are given, but disagrees with the standard notion in the presence of tuple-generating dependencies. Completeness is based on the intuitive semantics of tuple-generating dependencies but appears unnatural for equality-generating dependencies. It is argued that neither approach is the correct one, but rather that they correspond to different policies on constraint enforcement, and each one is appropriate in different circumstances. Consistency and completeness of a state are characterized in terms of the tableau associated with the state and in terms of logical properties of a set of first-order sentences associated with the state. A close relation between the problems of testing for consistency and completeness and of testing implication of dependencies is established. The possibility of formalizing dependency satisfaction without using a universal relation scheme is examined.

A theory of data dependencies over relational expressions

A formal system is developed for reasoning about a class of dependencies that includes all classes considered in the literature. The usefulness of the system is illustrated by applying it to various database design problems. The system is shown to be sound and complete by adapting the analytic tableaux method of first-order predicate calculus to the class of dependencies adopted. Finally, the method is shown to be a decision procedure for the inference problem of a subclass of the dependencies considered.

Independent database schemas

SESSION: Session 8

Session details: Session 8

The format model: a theory of database organization

A new theory of data representation involving "formats", which are based on three recurrent and prominent data-structuring concepts, is introduced. In a mathematically rigorous way, a notion of "equivalent" information capacity is defined and shown to be natural in a wide range of contexts. A normal form is introduced, and each equivalence class of formats is shown to have a unique representative in normal form. Finally, a natural way of comparing the information capacity of (non-equivalent) formats is formalized and studied.

An operational approach to data bases

We present a model-independent treatment of some important data base problems. Our approach uses the data base update as elementary concept. The basic tool is the data base view defined by a set of data base updates rather than by the usual view definition mapping.

The inference problem for template dependencies

A template dependency is a formalized integrity constraint on a relational database, stating that whenever tuples exist in the database that agree on certain attributes, an additional tuple must also be present that agrees with the others in a specified way. It is here shown that the inference problem for template dependencies is undecidable, that is, there can be no algorithm for determining whether a given dependency is a logical consequence of a given finite set of dependencies. The undecidability result holds whether or not databases are considered to be necessarily finite.

The implication and finite implication problems for typed template dependencies

The class of typed template dependencies is a class of data dependencies that includes embedded multivalued and join dependencies. We show that the implication and the finite implication problems for this class are unsolvable. An immediate corollary is that this class has no formal system for finite implication. We also show how to construct a finite set of typed template dependencies whose implication and finite implication problems are unsolvable.The class of simple template dependencies is a proper subclass of the above class, and it generalizes slightly embedded join dependencies. It is shown that the implication and the finite implication problems for this class are also unsolvable. An immediate corollary is that this class has no formal system for either implication or finite implication.

SESSION: Session 9

Session details: Session 9

Using write-once memory for database storage

Performance evaluation of a two-phase commit based protocol for DDBs

Many concurrency control algorithms for distributed database management systems have been proposed in the last few years, but little has been done to analyse their performance. This paper presents the specification of a concurrency control algorithm based on the two-phase commit protocol for DDBs. The results of a complete performance analysis based on an analytic model are presented and discussed.

External hasing with limited internal storage

Foundations for multifile design by application partitioning

SESSION: Session 10

Session details: Session 10

Concurrency control in a dynamic search structure

Compatibility and commutativity in non-two-phase locking protocols

Research on concurrency control mechanisms for database systems has had as a primary goal the discovery of techniques for allowing increased levels of concurrent execution of transactions. In this paper, we study this problem in the context of non-two-phase locking protocols which are defined for data bases in which a directed acyclic graph structure is superimposed on the data items. We introduce a new lock mode, called INV, with properties fundamentally different from locking modes previously studied and show how this allows increased concurrency. Through the introduction of the INV mode of locking we have enunciated a new principle of the theory of data base concurrency control. This principle involves the separation of the effects of the commutativity and compatibility of data manipulation operations. We then examine how the introduction of such a lock mode affects the occurrence of deadlocks in a system. Certain conditions under which deadlock-freedom is maintained are identified, and simple methods for removing deadlocks in other situations are presented.

Theory of serializablity for a parallel model of transactions

In this paper we present a parallel program schema model of a transaction system and generalize the concept of serializability from the sequential two-step model to a parallel multi-step model. We define two classes of serializable executions, and for each class we discuss two problems: recognition and scheduling. It is shown that the results for the recognition and online scheduling problems for the sequential model generalize to the parallel model. But it is argued that online scheduling is not suitable for a parallel execution environment. Therefore, batch schedulers are defined and a minimal set of precedence constraints is derived. Finally, it is shown that any optimal batch scheduler that uses syntactic information alone cannot be efficient.