The consensus problem involves an asynchronous system of processes, some of which may be unreliable. The problem is for the reliable processes to agree on a binary value. We show that every protocol for this problem has the possibility of nontermination, even with only one faulty process. By way of contrast, solutions are known for the synchronous case, the "Byzantine Generals" problem.

We describe a reliability algorithm being considered for DDM, a distributed database system under development at Computer Corporation of America. The algorithm is designed to tolerate clean site failures in which sites simply stop running. The algorithm allows the system to reconfigure itself to run correctly as sites fail and recover. The algorithm solves the subproblems of atomic commit and replicated data handling in an integrated manner.

Commit protocols guarantee the consistency of distributed databases in absence of any failures. A commit protocol is resilient to a class of failures if it is possible to guarantee that a) databases at all operational sites in presence of these failures are consistent and b) other sites can be recovered consistently with these sites when the failure is repaired. Such a commit protocol is called *nonblocking* if no operational site needs to wait on a transaction which is incomplete at the time of the failure. It is known that no nonblocking commit protocol resilient to network partitioning exists. In this paper, the possible termination protocols of commit protocols are studied in the context of network partitioning. A formal model for termination protocols is introduced and a general logical interpretation of termination protocols is presented. The model makes use of all the information that is available in a component of the partition --- namely, the constituent sites and their respective states at the time of partition. Optimality measures for the termination protocols in terms of the number of waiting components and average number of waiting sites are introduced and protocols optimal under these measures are produced for all the possible centralized and decentralized commit protocols. It is proved that quorum-based termination protocols indeed perform very well in the presence of network partitioning. If the central site(s) is reliable, we can prove that centralized commit protocols indeed perform better than all decentralized ones. Thus, the general preference for centralized commit protocols is justified.

View integration is investigated with the help of three classes of interrelational dependencies, inclusion dependencies, exclusion dependencies and union functional dependencies. The process of view integration is divided into two steps, combination and optimization. View combination consists in defining new interrelational dependencies that capture similarities between different views. The optimization step tries to reduce redundancy and the size of the schema. Finally, general results about interrelational dependencies are presented that lead to an optimization procedure for a restricted class of schemas.

The power of the relational model with inclusion dependencies is compared with that of the universal instance model. A characterization of allowable sets of inclusion dependencies is made, the relational and universal instance models are proved equivalent given this restriction.

A set Σ of functional dependencies and inclusion dependencies *implies* a single dependency σ if all databases (finite and infinite) which satisfy Σ also satisfy σ. This paper presents complete inference rules for deducing implications of inclusion and functional dependencies. The results of [5] suggest that the implication problem for functional and inclusion dependencies together has no simple axiomatization satisfying a natural set of conditions. Out of necessity, the inference rules presented here do not satisfy the conditions assumed in [5].

We develop a method of representing binary search trees in an environment in which pointers and other structural information may be "lost" or "maliciously altered". Our fault tolerant representation permits any 2 field changes to be detected and any 1 to be corrected without significantly increasing to storage requirements of the binary tree. The detection and correction procedures applied to the entire tree require 0(*n*) time.Our discipline is also used to represent binary search trees with a single pointer per datum without altering the cost of searching or updating. While our scheme can be applied in conjunction with any underlying tree balancing scheme ([AVL], bounded balance [Nievergelt et al] etc), if no balancing scheme is employed, the trees we form will have significantly shorter search paths than those formed using the straightforward algorithm.

A new interpolation-based order preserving hashing algorithm suitable for on-line maintenance of large dynamic external files under sequences of four kinds of operations *insertion, update, deletion,* and *orthogonal range query* is proposed. The scheme, an adaptation of linear hashing, requires no index or address directory structure and utilizes O(n) space for files containing n records, all of the benefits of linear hashing are inherited by this new scheme. File implementations yielding average successful search lengths much less than 2 and average unsuccessful search lengths much less than 4 for individual records are obtainable, the actual storage required is controllable by the implementor.

The extendible hash file is a dynamic data structure that is an alternative to B-trees for use as a database index. While there have been many algorithms proposed to allow concurrent access to B trees similar solutions for extendible hash files have not appeared. In this paper, we present solutions to allow for concurrency that are based on locking protocols and minor modifications in the data structure.Another question that deserves consideration is whether these indexing structures can be adapted for use in a distributed database. Among the motivations for distributing data are increased availability and ease of growth, however, unless data structures in the access path are designed to support those goals, they may not be realized. We describe some first attempts at adapting extendible hash files for distributed data.

We present several algorithms to search data bases that consist of text. The algorithms apply mostly to very large data bases that are difficult to structure.We describe algorithms which search the original data base without transformation and hence could be used as general text searching algorithms. We also describe algorithms requiring pre-processing, the best of them achieving a logarithmic behaviour. These efficient algorithms solve the "plagiarism" problem among *n* papers.The problem of misspellings, ambiguous spellings, simple errors, endings, positional information, etc is nicely treated using signature functions.

Most research on query processing has focussed on quantifier-free conjunctive queries. Existing techniques for processing queries with quantifiers either compile the query into a nested loop program or use variants of Codd's reduction from the Relational Calculus to the Relational Algebra. In this paper we propose an alternative technique that uses an algebra of graft and prune operations on trees. This technique provides a significant savings in space and time. We show how to transform a quantified conjunctive query into a sequence of graft and prune operations.

Systems which translate queries written using a high-level conceptual model of a database into sequences of commands based on another model of the database are studied here. We take the view that these translators are similar to, albeit simpler than, the compilers for programming languages. Motivated by the recent interest in describing *all* the aspects of a compiler by an attributed grammar, we specify the formal syntax and semantics of two working database translators using attributed grammars. All the precise details about the parsing, the code optimization and the rules for preserving the query semantics are captured by those grammars. It is hoped that this approach brings the understanding of query languages closer to that of programming languages, and opens the possibility of applying the emerging technology of semantics-directed compiler construction to build query language translator-generators and to prove those translators correct.

We prove a sequence of results which characterize exactly the complexity of problems related to the evaluation of relational queries consisting of projections and natural joins. We show that testing whether the result of a given query on a given relation equals some other given relation is *D*^{p} complete (*D*^{p} is a class which includes both *NP* and co-*NP,* and was recently introduced in a totally different context [13]). We show that testing inclusion or equivalence of queries with respect to a fixed relation (or of relations with respect to a fixed query) is Π2 ^{p}-complete. We also examine the complexity of estimating the number of tuples of the answer.

This paper shows that granularity hierarchies may be used with many types of concurrency control algorithms. Hierarchical versions of a validation algorithm, a timestamp algorithm, and a multiversion algorithm are given, and hierarchical algorithm issues relating to request escalation and distributed databases are discussed as well. It is argued that these hierarchical algorithms should offer improved performance for certain transaction mixes.

The classical approaches to enforcing serializability are the *two-phase locking* technique and the *timestamp ondering* technique. Either approach requires that a read operation from a transaction be *negistered* (in the form of either a read timestamp or a read lock), so that a write operation from a concurrent transaction will not interfere improperly with the read operation. However, setting a lock or leaving a timestamp with a data element is an expensive operation. The purpose of the current research is to seek ways to reduce the overhead of synchronizing certain types of read accesses while achieving the goal of serializability.To this end, a new technique of concurrency control for database management systems has been proposed. The technique makes use of a hierarchical database decomposition, a procedure which decomposes the entire database into data segments based on the access pattern of the update transactions to be run in the system. A corresponding classification of the update transactions is derived where each transaction class is 'rooted' in one of the data segments. The technique requires a timestamp ordering protocol be observed for acesses within an update transaction's own root segment, but enables read accesses to other data segments to proceed without ever having to wait or to leave any trace of these accesses, thereby reducing the overhead of concurrency control. An algorithm for handling ad-hoc read-only transactions in this environment is also devised, which does not require read-only transactions to wait or set any read timestamp.

Distributed algorithms tend to be difficult to understand and even more difficult to prove correct. Using distributed dead-lock detection as a running example this paper presents a framework for stating, understanding, and proving the correctness of distributed algorithms for decision problems. The framework consists of a series of complexity levels. To simplify the initial levels, we treat the data structure of the algorithm as a database, and use the database notions of views and transaction atomicity. For each complexity level, we state theorems that need to be proved for each algorithm. The framework is illustrated using several existing deadlock detection algorithms. Finally, it is shown that the framework suggests new algorithms using the best features of several existing algorithms.

Many different algorithms have been proposed for database concurrency control, and many more can be synthesized by combining locking and timestamping. The correctness of these algorithms is already well understood, their performance is not. We need a model to help us understand, compare and control the behavior of locking and timestamping we present here a model which we hope will eventually play such a role, but which we believe is simple to understand and use.

Database schemas may be partitioned into two sub-classes tree schemas and cyclic schemas. The analysis of tree vs cyclic schemas introduced the concepts of GYO reductions, canonical connections and tree projections. This paper investigates the intricate relationships among these concepts in the context of universal relation databases.

Manifestations of the "universal relation assumption" can be seen either as definitions of a one-relation user view of data, or as algorithms for answering queries about arbitrary sets of attributes. In this paper we explore equivalences between these two points of view. We show that if the user's view is the representative instance, then our ability to answer queries about the universal relation, by applying relational algebra to the actual database, is equivalent to a "boundedness" condition on the dependencies of the database scheme. Further, whenever this condition holds, there is a finite union of lossless tableau mappings that produces the desired relation.

We propose and investigate the notion of separability to capture the design goal of independently updatable decompositions. As evidence in favor of separability as a natural concept of independence, we show that it is equivalent to a specialization of the abstract independent mappings defined by Bancilhon and SpyraLos. We then enaracterize separable schemes in the important case when the only constraints given are a set of functional dependencies and the join dependency for the database scheme. This characterization is also applicable to dependency preserving database schemes when a set of functional dependencies is given as constraint. Our characterization yields a polynomial-time algorithm for testing separability in these cases.

Let f be a relational expression and let q be a relation. It appears, surprisingly enough, that the concept of the table introduced in order to represent incomplete information can be used as a tool for representing {r f(r) ⊇ q} for a class of relational expressions built up from projection, join and positive selection. This fact, together with some other known properties of tables has wide applications and leads to a new technique for expressing and solving many database problems, especially those concerned with views. This technique is, we believe, more natural and systematic than those previously used such as tableau techniques. Some applications, like view dependencies, translating states between views and the view equivalence problem are presented.

We study the problem of translating updates of database views. View updates are disambiguated by requning that a specified view complement (i.e. a second view which contnms all the information omitted from the given view) remains constant during the translation. We study some of the computational problems related to the application of this general methodology in the context of relational databases. We restrict our attention to projective views of databases which consist of a single relation and satisfy functional dependencies. We first characterize complementary views and show that finding a minimum complement of a given view is *NP*-complete. We then study in detail the problem of translating the insertion of a tuple into a view and extend our results to the cases of deletion and replacement of a tuple. Finally we define and study a new kind of dependencies the explicit functional dependencies, which intuitively state that some part of the database information can be computed from the rest.

The notion of "sort set" is introduced here to formalize the fact that certain database relations can be sorted so that two or more columns are simultaneously listed in order. This notion is shown to be applicable in several ways to enhance the efficiency of an implemented database. A characterization of when order dependency implies the existence of sort sets in a database is presented, along with several corollaries concerning conplexlty, Armstrong relations and cliques of certain graphs.Sort-set dependencies are then introduced A (finite) sound and complete set of inference rules for sort-set deoendencies is presented, but there is no such set for functional and sort-set dependencies taken together. Deciding logical immplication for sort-set dependencies is proved to be polynomial, but if functional dependencies are included the problem is co-NP complete. Each set of sort-set and functional dependencies is shown to have an Armstrong relation A natural generalization of Armstrong relation, here called "separator," is given and then used to study the relationship between order and sort-set dependencies.

The desirability of acyclic database scheme is well argued in [L,BFMY]. When a scheme is described by multivalued dependencies, acyclicity means that the dependencies do not split each other's left-hand side and do not form intersection anomalies. We show that if the second condition fails to hold, the scheme can be amended so that it holds. The basic step is to add one attribute and some dependencies to resolve one intersection anomaly. This step generates an extension of the given scheme in which the anomaly does not exists. We also analyze the repetitive use of the basic step and prove that the transformation so defined removes all intersection anomalies. Finally, we characterize a class of attributes that can be removed from the final scheme, leaving it acyclic.

We suggest here a methodology for updating databases with integrity constraints and rules for deriving inexphcit information. First we consider the problem of updating arbitrary theories by inserting into them or deleting from them arbitrary sentences. The solution involves two key ideas when replacing an old theory by a new one we wish to minimize the change in the theory, and when there are several theories that involve minimal changes, we look for a new theory that reflects that ambiguity. The methodology is also adapted to updating databases, where different facts can carry different priorities, and to updating user views.

This paper examines the technique of adding attributes to a relational database scheme. It is shown that doing so can result in an improvement in the conceptual simplicity and efficiency of the scheme. Some well-known non-independent schemes are made independent by adding attributes, results are given to support the conjecture that all schemes can be made independent as well.

A simple extension of the relational model is introduced to study the effects of dynamic constraints on database evolution. Both static and dynamic constraints are used in conjunction with this "dynamic" extension of the relational model. The static constraints considered here are functional dependencies (fd's). The dynamic constraints involve global updates and are restricted to certain analogs of fd's, called "dynamic" fd's. The results concern the interaction between the static and dynamic constraints. The effect of the past history of the database on the static constraints is investigated using the notions of age and age-closure. The connection between the static constraints and the future evolution of the database is described through the notions of survivability and survivability-closure.

An algebraic framework for investigating the problem of decomposing a relational database schema into components is developed. It is argued that the views of a relational schema which are to be the components of a decomposition should form a finite atomic Boolean algebra. The unit of the algebra is the identity view, and the zero is the null view. The join operation in this algebra is to be a generalization of the usual concept of join; the resulting view should contain precisely the representation contained in the two component views as a unit. The meet operation in this algebra is to measure the interdependence of the components, and is to be zero if and only if they are independent. A decomposition of the schema is then to be a decomposition of the identity component, with the ultimate decomposition the one consisting entirely of atoms.The thrust of the results is in two directions. First, the general properties of relational schemata particular to this problem are developed, within the framework of first-order logic. The key formulations are those of abstract meet and join of schemata. Using these formulations, it is shown that a completely general decomposition theory is impossible. The second part of the work islolates a relatively large class of schemata which do admit reasonable decompositions. These schemata include all of the usual decomposition problems, including project-join decompositions and horizontal decompositions of schemata constrained by universal Horn sentences.