Many results in relational database theory on the structure of dependencies, query languages, and databases in general have now been established. However, neither (a) the reliance of these results on various *assumptions,* nor (b) the desirability or reasonableness of these assumptions themselves have been closely examined. These assumptions are nontrivial: examples include the universal relation assumption and the lossless join assumption.The purpose of the present paper is to clarify many of the existing assumptions, and point out weaknesses. This is desirable both to harden the statements of previous results, and to evaluate recent suggestions that certain assumptions (such as the acyclic JD assumption) may be useful for modeling "real world" databases. Specifically, studies are made of assumptions made for (1) universal relations, (2) functional dependency inference, and (3) decomposition theory. We show that:• Some assumptions (such as uniqueness of relationships among attributes) can be more powerful than they appear;• common treatment of FDs is sometimes inappropriate, and for example FD inferences such as {*A* → *B*, *B* → *C*} |= *A* → *C* can be incorrect;• the 'decomposition' approach to design may be hard to justify in real terms; and• Acyclic JDs may have drawbacks in eliminating ambiguity in queries and in modeling real enterprises.It is hoped that this exposition will help clarify some confusing issues in this field, and will lead to a better understanding of which assumptions are reasonable and useful in modeling the "real world".

In this paper, we try to put to rest many of the objections to the universal relation concept that have appeared in the literature. First, we shall taxonomize the varieties of ideas that are sometimes called the "universal relation assumption." Then, we consider some of the arguments pro and con. In some cases, the arguments against were expressed prematurely, and solutions to the problems they expose have since been found. In other cases, the arguments against are simply fallacious. In still other cases, the problems pointed out are real, but simply serve to point out that the advantages of the universal relation are not gotten for free. We shall conclude the paper with a description of the algorithm used to interpret queries in System/U, and the reasoning behind it.

We demonstrate a sense in which the equivalence between blocks (subgraphs without articulation points) and biconnected components (subgraphs in which there are two edge-disjoint paths between any pair of nodes) that holds in ordinary graph theory can be generalized to hypergraphs. The result has an interpretation for relational databases that the universal relations described by acyclic join dependencies are exactly those for which the connections among attributes are defined uniquely. We also exhibit a relationship between the process of Graham reduction [6] of hypergraphs and the process of tableau reduction [1] that holds only for acyclic hypergraphs.

One can partition the class of relational database schemas into tree schemas and cyclic schemas. In this paper we examine query processing implications of the partitioning; other areas impacted include dependency theory, schema design and graph theory.We consider a class of queries that compute the join of all relations in the database projected onto a prescribed set of attributes. We show that solving such queries (using the join, project and semijoin operators) is tantamount to creating an "embedded" tree schema which we call a tree projection. This lends further credibility to the pivotal nature of the tree/cyclic partitioning.Using the tree projection concept we analyze the problem of determining how many joins are needed to solve a query.

The problem of entities that play multiple roles in a relational database is studied, with an emphasis on the universal relation view of the database. A tree structure is used to describe the compatibilities among attributes. The compatibility information for an attribute encodes role playing information that cannot be described by traditional data dependencies. This structure is used to define the semantics of a QUEL-like (Stonebraker et al. [1976]) query language that uses special naming rules to permit easy use of attribute compatibility

We present a simple algorithm for maintaining a replicated distributed dictionary which achieves high availability of data, rapid processing of atomic actions, efficient utilization of storage, and tolerance to node or network failures including lost or duplicated messages. It does not require transaction logs, synchronized clocks, or other complicated mechanisms for its operation. It achieves consistency contraints which are considerably weaker than serial consistency but nonetheless are adequate for many dictionary applications such as electronic appointment calendars and mail systems. The degree of consistency achieved depends on the particular history of operation of the system in a way that is intuitive and easily understood. The algorithm implements a "best effort" approximation to full serial consistency, relative to whatever internode communication has successfully taken place, so the semantics are fully specified even under partial failure of the system. Both the correctness of the algorithm and the utility of such weak semantics depend heavily on special properties of the dictionary operations.

We examine the problem of concurrency control when the database management system supports multiple versions of the data. We characterize the limit of the parallelism achievable by the multiversion approach and demonstrate the resulting space-parallelism tradeoff.

In many large database applications there are certain elements mostly containing aggregate information, which are very frequently referred to (read and modified) by many transactions. If access to such fields has to obey to conventional two-phase lock protocols (1,2), transactions will be serialized in front of these "hot spots", i.e. the degree of parallelism is reduced. To avoid this kind of lock contention some improved lock protocols have been proposed, the most interesting of which is the one implemented in IMS Fast Path (3,4), where add and subtract may be performed concurrently on numerical fields, since backout is always possible with the unique inverse of each operand. A similar scheme is proposed in (10). We expand this idea to parallel readers *and* writers on numerical data types, proving that under certain conditions the result of such concurrent operations is consistent in the sense that it is equal to some serial schedule (2,5).

We examine the problem of determining whether a set of locked transactions, accessing a distributed database, is guaranteed to produce only serializable schedules. For a pair of transactions we prove that this concurrency control problem (which is polynomially solvable for centralized databases) is in general coNP-complete. We employ a new graph-theoretic technique and provide an efficient test for the special case of databases distributed between two sites only.

Various notions are introduced for the closure of a set of rectilinearly oriented rectangles on the plane. Optimal algorithms are presented for computing the closures. These algorithms imply optimal solutions to several problems related to database concurrency control.

In the pure relational model, duplicate tuples are automatically eliminated. Some real world languages such as DAPLEX, however, give users control over duplicate elimination. This paper extends the relational model to include multiset relations, i.e., relations with duplicate tuples. It considers three formalisms for expressing queries in this model: extended relational algebra, tableaux, and DAPLEX. It shows that, as in the original algebra, the equivalence problem for conjunctive expressions in the extended algebra can be solved using tableaux, and is NP-complete. Finally, it demonstrates that the extended algebra and DAPLEX have essentially the same expressiveness relative to conjunctive expressions.

Usually, the first normal form condition of the relational model of data is imposed. Presently, a broader class of data base applications like office information systems is considered where this restriction is not convenient. Therefore, an extension of the relational model is proposed consisting of Non First Normal Form (NF^{2}) relations. The relational algebra is enriched mainly by so called nest and unnest operations which transform between NF^{2} relations and the usual ones. We state some properties of these operations and some rules which occur in combination with the operations of the usual relational algebra. Since we propose to use the NF^{2} model also for the internal data model these rules are important not only for theoretical reasons but also for a practical implementation.

We describe a network schema design algorithm that uses a well-designed relational schema as input. The resulting network database handles incomplete information, since we use only the modified foreign-key constraint and not the universal instance assumption. We believe that the representative instance is a correct representation of information stored in the database. We show how to map the representative instance to the network database. We give two algorithms for translating a relational query to a network application program. The first algorithm is confined to relational queries that express selections and projections over the universal relation scheme. The second algorithm includes all relational queries over select, project and (natural) join. Optimization techniques are discussed for both types of translation.

A logic program consists of a set of Horn clauses, and can be used to express a query on relational data bases. It is shown that logic programs express precisely the queries in YE^{+} (the set of queries representable by a fixpoint applied to a positive existential query). Queries expressible by logic programs are thus not first order queries in general; nor are all the first order queries expressible as logic programs. A way of adding the *negation operator* to logic programs is suggested. The resulting set of *clausal* queries equals FP, the set of first order queries closed *under fixpoints* (as well as ¬, ∨, 3).

We consider the problem of optimizing conjunctive queries in the presence of inclusion and functional dependencies. We show that the problem of containment (and hence those of equivalence and non-minimality) is in NP when either (a) there are no functional dependencies or (b) the set of dependencies is what we call *key-based.* These results assume that infinite databases are allowed. If only *finite* databases are allowed, new containments may arise, as we illustrate by an example. We also prove a "compactness" theorem that shows that no such examples can exist for case (b).

Inclusion dependencies, or INDs (which can say, for example, that every manager is an employee) are studied, including their interaction with functional dependencies, or FDs. A simple complete axiomatization for INDs is presented, and the decision problem for INDs is shown to be PSPACE-complete. (The decision problem for INDs is the problem of determining whether or not Σ logically implies σ, given a set Σ of INDs and a single IND σ). It is shown that finite implication (implication over databases with a finite number of tuples) is the same as unrestricted implications for INDs, although finite implication and unrestricted implication are distinct for FDs and INDs taken together. It is shown that, although there are simple complete axiomatizations for FDs alone and for INDs alone, there is no complete axiomatization for FDs and INDs taken together, in which every rule is k-ary for some fixed k (and in particular, there is no finite complete axiomatization.) This is true whether we consider finite implication or unrestricted implication, and is true even if no relation scheme has more than three attributes. The nonexistence of a k-ary complete axiomatization for FDs and INDs taken together is proven by giving a condition which is necessary and sufficient in general for the existence of a k-ary complete axiomatization.

Two notions of dependency satisfaction, *consistency* and *completeness,* are introduced. Consistency is the natural generalization of weak satisfaction and seems appropriate when only equality-generating dependencies are given, but disagrees with the standard notion in the presence of tuple-generating dependencies. Completeness is based on the intuitive semantics of tuple-generating dependencies but appears unnatural for equality-generating dependencies. It is argued that neither approach is the *correct* one, but rather that they correspond to different policies on constraint enforcement, and each one is appropriate in different circumstances. Consistency and completeness of a state are characterized in terms of the tableau associated with the state and in terms of logical properties of a set of first-order sentences associated with the state. A close relation between the problems of testing for consistency and completeness and of testing implication of dependencies is established. The possibility of formalizing dependency satisfaction without using a universal relation scheme is examined.

A formal system is developed for reasoning about a class of dependencies that includes all classes considered in the literature. The usefulness of the system is illustrated by applying it to various database design problems. The system is shown to be sound and complete by adapting the analytic tableaux method of first-order predicate calculus to the class of dependencies adopted. Finally, the method is shown to be a decision procedure for the inference problem of a subclass of the dependencies considered.

A new theory of data representation involving "formats", which are based on three recurrent and prominent data-structuring concepts, is introduced. In a mathematically rigorous way, a notion of "equivalent" information capacity is defined and shown to be natural in a wide range of contexts. A normal form is introduced, and each equivalence class of formats is shown to have a unique representative in normal form. Finally, a natural way of comparing the information capacity of (non-equivalent) formats is formalized and studied.

We present a model-independent treatment of some important data base problems. Our approach uses the data base update as elementary concept. The basic tool is the data base view defined by a set of data base updates rather than by the usual view definition mapping.

A template dependency is a formalized integrity constraint on a relational database, stating that whenever tuples exist in the database that agree on certain attributes, an additional tuple must also be present that agrees with the others in a specified way. It is here shown that the inference problem for template dependencies is undecidable, that is, there can be no algorithm for determining whether a given dependency is a logical consequence of a given finite set of dependencies. The undecidability result holds whether or not databases are considered to be necessarily finite.

The class of typed template dependencies is a class of data dependencies that includes embedded multivalued and join dependencies. We show that the implication and the finite implication problems for this class are unsolvable. An immediate corollary is that this class has no formal system for finite implication. We also show how to construct a finite set of typed template dependencies whose implication and finite implication problems are unsolvable.The class of simple template dependencies is a proper subclass of the above class, and it generalizes slightly embedded join dependencies. It is shown that the implication and the finite implication problems for this class are also unsolvable. An immediate corollary is that this class has no formal system for either implication or finite implication.

Many concurrency control algorithms for distributed database management systems have been proposed in the last few years, but little has been done to analyse their performance. This paper presents the specification of a concurrency control algorithm based on the two-phase commit protocol for DDBs. The results of a complete performance analysis based on an analytic model are presented and discussed.

Research on concurrency control mechanisms for database systems has had as a primary goal the discovery of techniques for allowing increased levels of concurrent execution of transactions. In this paper, we study this problem in the context of non-two-phase locking protocols which are defined for data bases in which a directed acyclic graph structure is superimposed on the data items. We introduce a new lock mode, called INV, with properties fundamentally different from locking modes previously studied and show how this allows increased concurrency. Through the introduction of the INV mode of locking we have enunciated a new principle of the theory of data base concurrency control. This principle involves the separation of the effects of the commutativity and compatibility of data manipulation operations. We then examine how the introduction of such a lock mode affects the occurrence of deadlocks in a system. Certain conditions under which deadlock-freedom is maintained are identified, and simple methods for removing deadlocks in other situations are presented.

In this paper we present a parallel program schema model of a transaction system and generalize the concept of serializability from the sequential two-step model to a parallel multi-step model. We define two classes of serializable executions, and for each class we discuss two problems: recognition and scheduling. It is shown that the results for the recognition and online scheduling problems for the sequential model generalize to the parallel model. But it is argued that online scheduling is not suitable for a parallel execution environment. Therefore, batch schedulers are defined and a minimal set of precedence constraints is derived. Finally, it is shown that any optimal batch scheduler that uses syntactic information alone cannot be efficient.