The acyclicity degree of a relational database scheme is an interesting topic due to several desirable properties of the corresponding database [5,16]. In this paper a simple and homogeneous way to characterize the acyclicity degree of a database scheme is given. The method is based on the existence in all acyclic database schemes of a nonempty set of relation schemes that satisfy a "pruning predicate", which is a property similar to the property satisfied by the leaves in an ordinary tree. This fact implies that such relation schemes may be eliminated using a recursive pruning algorithm in order to determine the acyclicity degree. Furthermore, if we use an incremental step by step design methodology, enriching the scheme one relation at a time, the pruning predicate suggests a set of properties that have to be verified by the new relation scheme in order to preserve the acyclicity degree of the database scheme.

Acyclic database schemes have attracted a lot of interest because of the nice properties enjoyed by such schemes. Recently some new acyclicity conditions that are strictly stronger than the normal α-acyclicity have been introduced by Fagin. Because of increased requirements, the database schemes in the new classes have some further useful properties that are not shared by α-acyclic schemes. Therefore the new classes have practical relevance.A database designer may work in terms of attribute sets and data dependencies, and not only in terms of database schemes. Thus it is important to have a characterization for the acyclic schemes of various degree in terms of data dependencies. For α-acyclic schemes such a characterization exists, but for the new classes the question has been open. In this paper we provide characterizations for β-, γ- and Berge-acyclic database schemes. The characterizations can be stated in a simple form: thus they should be useful for the database designer.

We present a new graph theoretic approach to the implication problem for functional (FD) and inclusion (IND) dependencies. Using this methodology we prove decidability for the case of typed IND's and acyclic FD's. We provide new lower bounds for the implication of typed, acyclic IND's and FD's --- NP-hardness and PSPACE-hardness for bounded domains. Finally, we show that there is no k-ary complete axiomatization for implication of FD's, even when we have pairwise consistency, i.e., all possible typed IND's hold.

The purpose of a database concurrency control mechanism is to control a transaction system in such a way that only serializable executions of transactions are possible; that is, safety is enforced. Locking is an appropriate means to achieve safety. In this paper the following problem is considered:Can the set of all serializable schedules (in the sense of conflict-preserving serializability) of a transaction system be defined by locking, i.e., can maximal concurrency be achieved by locking?An efficient algorithm exists for the problem in the case of two-transaction systems, but in general the problem is NP-complete. In the case in which maximal concurrency indeed is achievable by locking a locking policy realizing maximal concurrency can be constructed efficiently. Finally, an easy-to-test sufficient condition under which maximal concurrency is achieved by locking is discussed.

A large number of locking protocols use precedence relations among data items to ensure the serializability of the database system. These protocols have extended the semantics of the exclusive lock from prohibiting access to a data item to prohibiting access to an entire subgraph. In this paper we argue that combining the use of exclusive locks for these different purposes is ill conceived. We present a general theory on how these two distinct functions can be separated into the traditional locks operating on the individual data items, and a corresponding set operating on the edges of graph. This is illustrated by a general transformation from a given graph protocol to the new edge protocol which preserves the major properties of the original protocol. We then give a characterization for a large class of edge lock protocols within a database system, which includes most previous locking protocols defined on databases organized as graphs. We show how this separation increases concurrency, and generalizes previously unrelated concepts, such as using deadlock avoidance locks in general locking protocols.

This paper is an attempt to bridge the gap that is developing between "practitioners" and "theoreticians" with respect to the use of Byzantine Agreement protocols in distributed database systems. We present an informal overview of Byzantine Agreement and study when and how this type of protocol can be used in general-purpose database management systems. We argue that the main application of this protocol is in the distribution of input transactions to a fully replicated database system. We also argue that other database uses, such as for transaction commit, message broadcast, and object location, may be limited.

Developing a commercial or specialized database system is an exceedingly costly and time consuming undertaking. A goal of this research is to demonstrate that a significant portion of a DBMS's software, specifically the physical database component, can be developed automatically from a small set of specifications.In this paper, we explain a new method of modeling physical databases and show that it provides a framework for realizing such a software development technology. Unlike any existing method, ours makes conceptual-to-*internal mappings explicit.* This enables the physical database component of operational DBMSs to be described in a systematic, precise, and simple way. Because our method also extends earlier research, it also provides a means for tieing physical database theory to practice.

We propose here a data model that generalizes the relational, hierarchical, and network models. A database scheme in this model is a directed graph, where leaves represent data and internal nodes represent connections between the data. Instances are constructed from *r-values,* which constitute the data space, and *l-values,* which constitute the address space. We define a logic for the model, and describe a non-procedural quary language that is based on the logic. Finally, we describe an algebraic query language and show that it is equivalent to the logical language.

Fundamental notions of relative information capacity between database structures are studied in the context of the relational model. Four progressively less restrictive formal definitions of "dominance" between pairs of relational database schemata are given. Each of these is shown to capture intuitively appealing, semantically meaningful properties which are natural for measures of relative information capacity between schemata. Relational schemata, both with and without key dependencies, are studied using these notions. A significant intuitive conclusion concerns the informal notion of relative information capacity often suggested in the conceptual database literature, which is based on accessability of data via queries. Results here indicate that this notion is too general to accurately measure whether an underlying semantic connection exists between database schemata. Another important result of the paper shows that under any natural notion of information capacity equivalence, two relational schemata (with no dependencies) are equivalent if and only if they are identical (up to re-ordering of the attributes and relations). The approach and definitions used here can form part of the foundation for a rigorous investigation of a variety of important database problems involving data relativism, including those of schema integration and schema translation.

Logical, algebraic, programming language, grammatical and denotational formalisms are investigated with res pect to their applicability to formal data base speci fication. On applying each formalism for the purpose that originally motivated its proposal, it is shown that they all have a fundamental and well integrated role to play in different parts of the specification process. An example is included to illustrate the methodological aspects.

A new, formally defined database model is introduced which combines fundamental principles of "semantic" database modeling in a coherent fashion. The model provides mechanisms for representing structured objects and functional and ISA relationships between them. It is anticipated that the model can serve as the foundation for a theoretical investigation into a wide variety of fundamental issues concerning the logical representation of data in databases. Preliminary applications of the model include an efficient algorithm for computing the set of object types which can occur in a given entity set, even in the presence of a complex set of ISA relationships. The model can also be applied to precisely articulate "good" design policies.

Relational database schemes can be partitioned into acyclic (tree) schemes and cyclic schemes. This partitioning also classifies the natural join queries into tree queries and cyclic queries. The problems of determining the minimum number of joins needed (i) to transform a cyclic scheme into an acyclic one, and (ii) to solve a cyclic query are shown [GS1 82, GS2 82] to be two different NP-complete problems. We consider a class of database schemes, called simple schemes. We show that the two problems above are equivalent but remain to be NP-complete when they are restricted to simple schemes. We give a polynomial time algorithm for both problems on simple schemes whose qual graphs are chordal. As a by-product, we also show that an edge contraction problem on undirected graphs is NP-complete and it is polynomial when the graph is chordal.

In [9] we proposed a method for decomposing join dependencies (jd's) in a relational database. Decomposing a jd can be useful for separating cyclic and acyclic parts of jd's, obtaining more insight in the structure of a jd or making integrity-checking more efficient. The decomposition methodology of [9] has many desirable properties. However, in general it cannot generate all the decompositions of a given jd. In this paper, we first recall this decomposition methodology and its most important properties. We then introduce a subclass of jd's, the unambiguous jd's. We show that this class represents exactly those jd's that have a unique decomposition (which can be obtained by our method). We also give a characterization of this decomposition in terms of the structure of the original jd. To prove our results, we make extensive use of hypergraph theory.

A multidatabase system provides a logically integrated view of existing, possibly inconsistent, databases. Logical integration is achieved primarily through the use of generalization, which can be modelled algebraically as a sequence of outerjoin and aggregation operations. Conventional distributed query processing techniques are inadequate for processing queries over views defined by outerjoins and aggregates. In a conventional distributed database system, selections and projections are inexpensive to process; hence joins have been the rocus of most previous research. In a multidatabase system, however, even selections and projections can be as expensive as joins. The semiouterjoin operation can potentially reduce query processing costs. In general, there may be many different strategies based on semiouterjoins for processing a given query. The query optimization problem is to choose the most profitable of these strategies. This paper studies the query optimization problem for selection and projection queries. It develops linear-time solutions to the problem, and then extends these solutions to provide heuristics for joins and conjunctive queries.

The problem of reflecting updates from a view schema to the base schema in relational database systems is investigated. The basic strategy which is employed is that of translation relative to a constant complement, as developed by Bancilhon and Spyratos. We deal in particular with one of the apparent drawbacks of the constant complement approach, namely, the nonuniqueness of such complements. Our approach is to use only a specific set of well-behaved views, termed the *components* of the schema, as complements. The components form a Boolean algebra, and update reflection to the base schema is independent of choice of complement, as long as the complement is a component. We argue that such reflections of updates are the "natural" ones which should be allowed.

By interleaving the bits of the binary representations of the attribute values in a tuple, an integer corresponding to the tuple is created. A set of these integers represents a relation. The usual ordering of these integers corresponds to an ordering of multidimensional data that allows the use of conventional file organizations, such as Btrees, in the efficient processing of multidimensional queries (e.g. range queries). The class of data structures generated by this scheme includes a type of kd tree whose balance can be efficiently maintained, a multidimensional Btree which is simpler than previously proposed generalizations, and some previously reported data structures for range searching. All of the data structures in this class also support the efficient implementation of the set operations.

We formulate some natural conditions which should be satisfied in an extension of the relational algebra from the usual relations to tables with null values, and we explain the motivation behind these conditions. Roughly speaking, our conditions say that the extended algebra makes it possible to correctly compute the "true tuples" in the result of applying a relational expression (query) to tables with nulls (database state), and that the computation can be carried out recursively, following the structure of the expression. We prove that these conditions are exactly equivalent to other conditions proposed earlier by the author and T. Imieliński. We give a simple proof of the correctness of the "naive" extension of the relational algebra to tables with marked nulls, where the nulls are treated as if they were regular values, and which supports the operations of projection, positive selection, union, join and renaming of attributes. We also show that the result of the naive evaluation of such an expression (query) is equal to the response to the query as defined --- in a proof-theoretic framework --- by Reiter.

User views in a relational database obtained through a single projection ("projection views") are considered in a new framework. Specifically, such views where each tuple in the view represents an object ("object projection views") are studied using the dynamic relational model of [V1,V2], which captures the evolution of the database through consecutive updates. Attribute sets which yield object projection views are characterized using the static and dynamic functional dependencies satisfied by the database. Object projection views are then described using the static and dynamic fd's "inherited" from the original database, and the notion of age-closure [V1]. Finally, the impact of dynamic constraints on the view update problem is studied in a limited context. The paper demonstrates that new, useful information about views can be obtained by looking at the evolution of the database as captured by the dynamic relational model.

We suggest a new approach to database updates, in which a database is treated as a collection of theories. We investigate two issues: a) equivalence of databases under update operations, b) simultaneous multiple update operations.

A system for interfacing Prolog programs with relational algebra is presented. The system produces relational algebra queries using a deferred evaluation approach. Least fixed point (LFP) queries are automatically managed. An optimization method for removing redundant relations is also presented.

A database system, comprising a schema, integrity constraints, transactions, and queries, constitutes a single abstract data type. This type, which we call an *abstract database type,* has as its object the database itself. Thus, the value set of such a type is the set of all legal database states, legal in the sense of obeying all the structural specifications of the schema and the semantic prescriptions of the integrity constraints. The database transactions are the operations of the abstract database type and must be functions on the value set of the type. A transaction specification is *safe* if it defines a function which is closed on the database state set, i. e., any execution of the transaction on a legal database yields a legal database.We propose an approach to the definition of abstract database types which is both usable by typical database designers and which facilitates the mechanical verification of transaction safety. The Boyer and Moore theorem proving technique is used to prove transaction safety theorems using abstract data type axioms and recursive functions generated from the database schema and transaction programs.

Given a multirelational database scheme and a relational mapping f transforming it, an important question is whether the resulting scheme is equivalent to the original one. This question was addressed in the literature with respect to those relational schemes that satisfy the so called universal relation assumption; however, no study was ever concerned with multirelational (data base) schemes that do not necessarily satisfy this assumption.We present two general definitions of lossless transformation of the database scheme based on the so-called closed world and open world assumptions. While both definitions seem to be practically justified, the one based on the open world assumption is more "tractable" We are able to test losslessness defined in such a way for a wide class of relational expressions and dependencies. An algorithm for testing losslessness of a mappings (which are arbitrary relational expressions built up from projections, cartesian products and restrictions) is presented in the paper. Moreover, given a lossless transformation, our algorithm enables us to explicitly construct an "inverted" mapping that restores the corresponding state of the original database. The application of the algorithm to schemes specified by differrent types of dependencies is described. In particular the application of the algorithm for schemes specified by inclusion dependencies is presented. In this case the algorithm works for families of inclusion dependencies having finite chase property. This class of inclusion dependencies is characterized in the paper.

A database is consistent with respect to a set σ of dependencies if it has a weak instance. A weak instance is a universal relation that satisfies Σ, and whose projections on the relation schemes are supersets of the relations in the database. In this paper we investigate the complexity of testing consistency and the logics that can axiomatize consistency, relative to a fixed set Σ of dependencies. If Σ is allowed to include embedded dependencies, then consistency can be non-recursive. If Σ consists only of total dependencies, then consistency can be tested in polynomial time. The degree of the polynomial can, however, be arbitrarily high. Consistency can be axiomatized but not finitely axiomatized by equality generating dependencies. If embedded dependencies are allowed then consistency cannot be finitely axiomatized by any effective logic. If, on the other hand, only total dependencies are allowed then consistency can be finitely axiomatized by fixpoint logic.

An earlier paper introduced a simple performance model for studying the behaviour of locking. That paper treats a highly simplified form of locking, called the *no waiting case,* in which transactions restart when they request locks that are already held by others. This analysis is now extended to the more realistic *waiting case,* in which transactions are allowed to wait for conflicting locks, and restart only if there is a deadlock. The analysis begins with a system that has uniform access and exclusive locks only. The model's predictions for this base system agree well with simulation results. Next, a system with nonuniform access and another with shareable locks are each shown to be reducible to the base system. A comparison of the waiting and no waiting cases yields a surprising result: the throughput for the no waiting case is often better than for the waiting case, and never much worse.

Several online schedulers of transactions are considered. The scheduling algorithms are presented in a modular and uniform way. Deadlock and livelock prevention (using restarts, if necessary) is integrated into the algorithms. A metrics is defined for comparing the behavior of two schedulers on the same set of transactions. The measure that is used makes only a very simple assumption about the transactions. This assumption is easily motivated, and thus the measure is a useful approximation of measures appearing in reality. The behavior of the schedulers is compared on the basis of user delays caused by one scheduler on logs accepted as such by another scheduler. No order of magnitude differences are found, but the examples used for proving lower bounds for the delays are illustrative of the characteristics of the schedulers.