PODS '87- Proceedings of the sixth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems

Full Citation in the ACM Digital Library

Database theory—past and future

We briefly sketch the development of the various branches of database theory. One important branch is the theory of relational databases, including such areas as dependency theory, universal-relation theory, and hypergraph theory. A second important branch is the theory of concurrency control and distributed databases. Two other branches have not in the past been given the attention they deserve. One of these is “logic and databases,” and the second is “object-oriented database systems,” which to my thinking includes systems based on the network or hierarchical data models. Both these areas are going to be more influential in the future.

Logic programming with sets

Sets and negation in a logic data base language (LDL1)

In this paper we extend LDL, a Logic Based Database Language, to include finite sets and negation. The new language is called LDL1. We define the notion of a model and show that a negation-free program need not have a model, and that it may have more than one minimal model. We impose syntactic restriction in order to define a deterministic language. These restrictions allow only layered (stratified) programs. We prove that for any program satisfying the syntactic restrictions of layering, there is a minimal model, and that this model can be constructed in a bottom-up fashion. Extensions to the basic grouping mechanism are proposed. We show that these extensions can be translated into equivalent LDL1 programs. Finally, we show how the technique of magic sets can be extended to translate LDL1 programs into equivalent programs which can often be executed more efficiently

Logical design of relational database schemes

We define extended conflict free dependencies in the context of functional and multivalued dependencies, and prove that there exists an acyclic, dependency preserving, 4NF database scheme if and only if the given set of dependencies has an extended conflict free cover. This condition can be checked in polynomial time. A polynomial time algorithm to obtain such a scheme for a given extended conflict free set of dependencies is also presented. The result is also applicable when the data dependencies consists of only functional dependencies, giving the necessary and sufficient condition for an acyclic, dependency preserving BCNF database scheme

On designing database schemes bounded or constant-time maintainable with respect to functional dependencies

Under the weak instance model, to determine if a class of database schemes is bounded with respect to dependencies is fundamental for the analysis of the behavior of the class of database schemes with respect to query processing and updates. However, proving that a class of database schemes is bounded with respect to dependencies seems to be very difficult even for restricted cases. To resolve this problem, we need to develop techniques for characterizing bounded database schemes In this paper, we give a formal methodology for designing database schemes bounded with respect to functional dependencies using a new technique called extensibility. This methodology can also be used to design constant-time-maintainable database schemes

Computing covers for embedded functional dependencies

This paper deals with the problem of computing covers for the functional dependencies embedded in a subset of a given relation schema. We show how this problem can be simplified and present a new and efficient algorithm “Reduction. By Resolution” (RBR) for its solution. Though the problem of computing covers for embedded dependencies is inherently exponential, our algorithm behaves polynomially for several classes of inputs. RBR can be used for the solution of some related problems in the theory of database design, such as deciding whether a given database scheme is in Boyce-Codd Normal Form or decomposing a scheme into Boyce-Codd Normal Form.

Dynamic query interpretation in relational databases

A new dynamic approach to the problem of determining the correct interpretation of a logically independent query to a relational database is described. The proposed disambiguating process is based on a simple user-system dialogue that consists in a sequence of decisions about the relevance (or not) of an attribute with respect to the user interpretation

A new basis for the weak instance model

A new definition of the weak instance model is presented, which does not consider the missing values as existent though unknown, but just assumes that no information is available about them. It is possible to associate with the new definition logical theories that do not contain universally quantified variables. The new model enjoys various desirable properties of the old weak instance model, with respect to dependency satisfaction, query answering, and associated logical theories.

Answering queries in categorical databases

A compatible categorical data base can be viewed as a single (contingency) table by taking the maximum-entropy extension of the component tables. Such a view, here called universal table model, is needed to answer a user who wishes “cross-classified” categorical data, that is, categorical data resulting from the combination of the information contents of two or more base tables. In order to implement a universal table interface we make use of a query-optimization procedure, which is able to generate an appropriate answer both in the case that the asked data are present in the data base and in the case that they are not and, then, have to be estimated

Nested transactions and read-write locking

Transaction commitment at minimal communication cost

We consider the communication protocol for transaction commitment in a distributed database. Specifically, the connection between the structure of communication among the participating sites, and the communication network topology is investigated. In order to do so, the cost of transaction commitment is defined as the number of network hops that messages of the protocol must traverse. We establish the necessary cost for transaction commitment, and show that it is also sufficient. A simple distributed algorithm is presented to prove sufficiency. Our algorithm is also time-efficient, and in order to prove that we show that the timing of our algorithm is optimal within a natural class of commit-protocols.

The precedence-assignment model for distributed databases concurrency control algorithms

We have developed a unified model, called the precedence-assignment model (PAM), of concurrency control algorithms in distributed database. It is shown that two-phase locking timestamp-ordering and other existing concurrency control algorithms may be modeled by PAM. We have also developed a new concurrency control algorithm under the PAM modeling framework, which is free from deadlocks and transaction restarts. Finally, a unified concurrency control subsystem for precedence-assignment algorithms is developed. By using this subsystem, different transactions may be executed under different concurrency control algorithms simultaneously.

A knowledge-theoretic analysis of atomic commitment protocols

Perspectives in deductive databases (Abstract only)

I will discuss my experiences, some of the work that I have done and related work that influenced me, concerning deductive databases over the last 30 years. It will be convenient to divide this time period into roughly three equal parts, 1957 - 1968, 1969 - 1978, 1979 - present. For the first portion I will describe how my interest started in deductive databases in 1957, at a time when not even the field of databases existed I will describe work in the beginning years, leading to the start of deductive databases in about 1968 with the work of Cordell Green and Bertram Raphael. The second period saw a great deal of work in theorem proving as well as the introduction of logic programming. The existence and importance of deductive databases as a formal and viable discipline received its impetus at a workshop held in Toulouse, France, in 1977, which culminated in the book, Logic and Data Bases. The relationship of deductive databases and logic programming was recognized at that time. During the third and most recent period we have seen formal theories of databases come about as an outgrowth of that work, and the recognition that artificial intelligence and deductive databases are closely related, at least through the so-called expert database systems. I expect that the relationships between techniques from formal logic, databases, logic programming, and artificial intelligence will continue to be explored and the field of deductive databases will become a more prominent area of computer science in coming years.

Maintenance of stratified databases viewed as a belief revision system

We study here declarative and dynamic aspects of non-monotonic reasoning in the context of deductive databases. More precisely, we consider here maintenance of a special class of indefinite deductive databases, called stratified databases, introduced in Apt, Blair and Walker [ABW] and Van Gelder [VG] in which recursion “through” negation is disallowed. A stratified database has a natural model associated with it which is selected as its intended meaning. The maintenance problem for these databases is complicated because insertions can lead to deletions and vice versa. To solve this problem we make use of the ideas present in the works of Doyle [D] and de Kleer [dK] on belief revision systems. We offer here a number of solutions which differ in the amount of static and dynamic information used and the form of support introduced. We also discuss the implementation issues and the trade-offs involved.

Specification and implementation of programs for updating incomplete information databases

The problem of updating incomplete information databases is examined as a programming problem. From this point of view formal denotational semantics are developed for two applicative programming languages, BLU and HLU. BLU is a very simple language with only five primitives, and is designed primarily as a tool for the implementation of higher level languages. The semantics of BLU are formally developed at two levels possible worlds and clausal and the latter is shown to be a correct implementation of the former. HLU is a user level update language. It is defined entirely in terms of BLU, and so immediately inherits its semantic definition from that language. This demonstrates a level of completeness for BLU as a level of primitives for update language implementation. The necessity of a particular BLU primitive, masking, suggests that there is a high degree of inherent complexity in updating logical databases.

Operation specific locking in B-trees

B-trees have been used as an access and for both primary and secondary indexing for quite some time. This paper presents a deadlock free locking mechanism in which different processes make use of different lock types in order to reach the leaf nodes. The compatibility relations among locks on a node, do not exclusively depend on their type, but also on the node status and the number and kind of processes acting currently on the node. As a result, a number of insertion or deletion processes can operate concurrently on a node. The paper presents an appropriate recovery strategy in case of failure, and discusses the protocol modifications that are required so it can be used in other similar structures such as B+-trees, compressed B-trees, and R-trees for spatial searching.

Concurrency control in database structures with relaxed balance

We consider the separation of rebalancing from updates in several database structures, such as B-trees for external and AVL-trees for internal structures. We show how this separation can be implemented such that rebalancing is performed by local background processes. Our solution implies that even simple locking schemes (without additional links and copies of certain nodes) for concurrency control are efficient in the sense that at any time only a small constant number of nodes must be locked.

Performance results on multiversion timestamp concurrency control with predeclared writesets

Decomposing an N-ary relation into a tree of binary relations

We present an efficient algorithm for decomposing an n-ary relation into a tree of binary relations, and provide an efficient test for checking whether or not the tree formed represents the relation. If there exists a tree-decomposition, the algorithm is guaranteed to find one, otherwise, the tree generated will fail the test, then indicating that no tree decomposition exist. The unique features of the algorithm presented in this paper, is that it does not apriori assume any dependencies in the initial relation, rather it derives such dependencies from the bare relation instance.

Formal limits on the automatic generation and maintenance of integrity constraints

A formal approach to the automatic generation and maintenance of integrity constraints in relational databases is presented. It is assumed that some portion of the database extension is known and that constraints are to be formed on the basis of this portion. Since this portion may be updated or new relations added to the database the set of hypothesised constraints may require occasional revision. The goal is this paper is to characterise those constraints that may potentially be formed on the basis of a part of the extension. Formal systems are derived by means of which the set of constraints that can be formed is precisely specified. A procedure is derived for restoring the consistency of a set of constraints after conflicting tuples are encountered. It is shown that the set of constraints to which the procedure may be applied corresponds with minor limitations to the sentences of relational algebra.

Relative knowledge in a distributed database

Let DB be a database and let u1, , um be a collection of users each having at his or her disposal a query sublanguage Lu1 generated by some view predicate Each of these users knows only as much as he can learn from the database using his or her query sublanguage. Such a knowledge is called relative knowledge in the paper and its various properties including the model and proof theory are investigated. The applications of relative knowledge in the database security and integrity are also discussed.

The parallel complexity of simple chain queries

Bounds on the propagation of selection into logic programs

We consider the problem of propagating selections (i.e., bindings of variables) into logic programs. In particular, we study the class of binary chain programs and define selection propagation as the task of finding an equivalent program containing only unary derived predicates. We associate a context free grammar L(H) with every binary chain program H. We show that, given H propagating a selection involving some constant is possible iff L(H) is regular, and therefore undecidable. We also show that propagating a selection of the form p(X,X) is possible iff L(H) is finite, and therefore decidable. We demonstrate the connection of these two cases, respectively, with the weak monadic second order theory of one successor and with monadic generalized spectra. We further clarify the analogy between chain programs and languages from the point of view of program equivalence and selection propagation heuristics.

A decidable class of bounded recursions

Detecting bounded recursions is a powerful optimization technique for recursions database query languages as bounded recursions can be replaced by equivalent nonrecursive definitions. The problem is of theoretical interest because by varying the class of recursions considered one can generate instances that vary from linearly decidable to NP-hard to undecidable. In this paper we review and clarify the existing definitions of boundedness. We then specify a sample criterion that guarantees that the condition in Vaughton [7] is necessary and sufficient for boundedness. The programs satisfying this criterion subsume and extend previously known decidable classes of bounded linear recursions.

Decidability and expressiveness aspects of logic queries

This paper addresses some basic problems regarding logic programming based queries over relational databases. We re-examine the query classes H and YE+ defined by Chandra and Harel [2] We define H+ and YE++ which differ from H and YE+ in that the use of equality (=) and inequality (≠) is prohibited. We show that H+ is more expressive than YE++ and that any H+ program can be transformed into an equivalent H+ program containing a single recursive predicate without using the equality or inequality operators. As a corollary we obtain a fixpoint formula characterization of H+ queries. We consider the problems of determining containment, equivalence, and satisfiability of logic based queries. The containment and equivalence problems addressed here extend the work of Aho, Sagiv and Ullman on relational queries [1] and Papadimitrious on Prolog [10]. As corollaries we show that determining safety and literal redundancy are both undecidable problems.

Chickens and eggs—the interrelationship of systems and theory

This paper describes a personal perspective of the kinds of contributions that systems research and theoretical research make to one another particularly in the database area. Examples of each kind of contribution are given, and then several case studies from the author a personal experience are presented. The case studies illustrate database systems research where theoretical work contributed to systems results and vice versa. Areas of database systems which need more contributions from the theoretical community will also be presented.

Axiomatization and simplification rules for relational transactions

A translation language complete for database update and specification

On the power of magic

This paper considers the efficient evaluation of recursive queries expressed using Horn Clauses. We define sideways information passing formally and show how a query evaluation algorithm may be defined in terms of sideways information passing and control. We then consider a class of information passing strategies which suffices to describe most query evaluation algorithms in the database literature, and show that these strategies may always be implemented by rewriting a given program and evaluating the rewritten program bottom-up. We describe in detail several algorithms for rewriting a program. These algorithms generalize the Counting and Magic Sets algorithms to work with arbitrary programs. Safety and optimality of the algorithms are also considered.

Efficient evaluation for a subset of recursive queries

Well-known results on graph traversal are used to develop a practical, efficient algorithm for evaluating regularly and linearly recursive queries in databases that contain only binary relations. Transformations are given that reduce a subset of regular and linear queries involving n-ary relations (n > 2) to queries involving only binary relations.

Worst-case complexity analysis of methods for logic query implementation

On the expressive power of the extended relational algebra for the unnormalized relational model

Safety and correct translation of relational calculus formulas

Not all queries in relational calculus can be answered “sensibly” once disjunction, negation, and universal quantification are allowed. The class of relational calculus queries, or formulas, that have “sensible” answers is called the domain independent class, which is known to be undecidable. Subsequent research has focused on identifying large decidable subclasses of domain independent formulas In this paper we investigate the properties of two such classes the evaluable formulas and the allowed formulas. Although both classes have been defined before, we give simplified definitions, present short proofs of their man properties, and describe a method to incorporate equality. Although evaluable queries have sensible answers, it is not straightforward to compute them efficiently or correctly. We introduce relational algebra normal form for formulas from which form the correct translation into relational algebra is trivial. We give algorithms to transform an evaluable formula into an equivalent allowed formula, and from there into relational algebra normal form. Our algorithms avoid use of the so-called Dom relation, consisting of all constants appearing in the database or the query. Finally, we describe a restriction under which every domain independent formula is evaluable, and argue that evaluable formulas may be the largest decidable subclass of the domain independent formulas that can be efficiently recognized.

Safety of recursive Horn clauses with infinite relations

A database query is said to be safe if its result consists of a finite set of tuples If a query is expressed using a set of pure Horn Clauses, the problem of determining whether it is safe is in general undecidable In this paper, we show that the problem is decidable when terms involving function symbols (including arithmetic) are represented as distinct occurrences of uninterpreted infinite predicates over which certain finiteness dependencies hold. We present a sufficient condition for safety when some monotonicity constraints also hold.

One-sided recursions

The performance of systems with recursive query languages can be improved by recognizing simple, easily evaluable classes of recursions and using algorithms tailored to these classes whenever possible. In this paper we identify a useful subset of recursive definitions, the one-sided recursions. We show how to detect one-sided recursions, and give two simple evaluation algorithms that cover one-sided definitions in that for any selection on a one-sided definition, at least one of the two algorithms will apply. These algorithms have simple termination conditions, maintain minimal state and use selections on the recursively defined relation whenever possible. We show that there are no similar algorithms for many-sided recursions We also prove that it is undecidable whether an arbitrary definition has an equivalent one-sided definition. However, we do present a procedure that converts many potentially one-sided recursions to one-sided form, and prove it complete for a useful class of recursions.

Optimizing datalog programs

Datalog programs, i.e., Prolog programs without function symbols, are considered It is assumed that a variable appearing in the head of a rule must also appear in the body of the rule. The input of a program is a set of ground atoms (which are given in addition to the program's rules) and, therefore, can be viewed as an assignment of relations to some of the program's predicates. Two programs are equivalent if they produce the same result for all possible assignments of relations to the extensional predicates (i.e., the predicates that do not appear as heads of rules). Two programs are uniformly equivalent if they produce the same result for all possible assignments of initial relations to all the predicates (i.e., both extensional and intentional). The equivalence problem for Datalog programs is known to be undecidable. It is shown that uniform equivalence is decidable, and an algorithm is given for minimizing a Datalog program under uniform equivalence. A technique for removing parts of a program that are redundant under equivalence (but not under uniform equivalence) is developed. A procedure for testing uniform equivalence is also developed for the case in which the database satisfies some constraints.