PODS '89- Proceedings of the eighth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems

Full Citation in the ACM Digital Library

The alternating fixpoint of logic programs with negation

We introduce and describe the alternating fixpoint of a logic program with negation. The underlying idea is to monotonically build up a set of negative conclusions until the least fixpoint is reached, using a transformation related to the one that defines stable models, developed by Gelfand and Lifschitz. From a fixed set of negative conclusions, we can derive the positive conclusions that follow (without deriving any further negative ones), by traditional Horn clause semantics. The union of positive and negative conclusions is called the alternating fixpoint partial model. The name “alternating” was chosen because the transformation runs in two passes; the first pass transforms an underestimate of the set of negative conclusions into an (intermediate) overestimate; the second pass transforms the overestimates into a new underestimate; the composition of the two passes is monotonic. Our main theorem is that the alternating fixpoint partial model is exactly the well-founded partial model. We also show that a system is fixpoint logic, which permits rule bodies to be first order formulas but requires inductive relations to be positive within them, can be transformed straightforwardly into a normal logic program whose alternating fixpoint partial model corresponds to the least fixpoint of the fixpoint logic system. Thus alternating fixpoint logic is at least as expressive as fixpoint logic. The converse is shown to hold for finite structures.

Every logic program has a natural stratification and an iterated least fixed point model

A procedural semantics for well founded negation in logic programs

We introduce global SLS-resolution, a procedural semantics for well-founded negation as defined by Van Gelder, Ross and Schlipf. Global SLS-resolution extends Prsymusinski's SLS-resolution, and may be applied to all programs, whether locally stratified or not.1 Global SLS-resolution is defined in terms of global trees, a new data structure representing the dependence of goals on derived negative subgoals. We prove that global SLS-resolution is sound with respect to the well-founded semantics, and complete for non-floundering queries.

Logic programming as constructivism: a formalization and its application to databases

The features of logic programming that seem unconventional from the viewpoint of classical logic can be explained in terms of constructivistic logic. We motivate and propose a constructivistic proof theory of non-Horn logic programming. Then, we apply this formalization for establishing results of practical interest. First, we show that 'stratification' can be motivated in a simple and intuitive way. Relying on similar motivations, we introduce the larger classes of 'loosely stratified' and 'constructively consistent' programs. Second, we give a formal basis for introducing quantifiers into queries and logic programs by defining 'constructively domain independent' formulas. Third, we extend the Generalized Magic Sets procedure to loosely stratified and constructively consistent programs, by relying on a 'conditional fixpoint' procedure.

Complexity of query processing in databases with OR-objects

If ground disjunctive facts are admitted into a database the data complexity of conjunctive queries grows from PTIME into CoNP with some simple examples of CoNP-Complete conjunctive queries. A natural question which arises in this context is whether it is possible to syntactically characterize those queries which are “bad” (i.e. CoNP-Complete) from those that are “good” (i.e. with PTIME data complexity) given a predefined 'pattern” of disjunctions in the database. In this paper, we study the data complexity of conjunctive queries. We give a complete syntactic characterization of CoNP-Complete conjunctive queries for a class of disjunctive databases called OR-Databases. Our results can be used in complexity tailored design where design decisions are motivated by complexity of query processing. Also, we establish that a similar complete syntactic characterization for disjunctive queries, with negation allowed only on base predicates, would answer the open problem “Does Graph Isomorphism belong to PTIME or is it NP-Complete?”.

A sound and complete query evaluation algorithm for relational databases with disjunctive information

Horn tables-an efficient tool for handling incomplete information in databases

Invited talk: automata theory for database theoreticians

Declarative expression of deductive database updates

An update can be specified as a single database state transition, or as a sequence of queries and database state transitions. We give an extension of Datalog for expressing both types of update specifications on a logic database. The extension supports the simple and intuitive expression of basic update operations, hypothetical reasoning and update procedures. The extension possesses a possible-world semantics, and a sound and complete proof theory. Soundness and completeness is proved by showing that an update procedure can be mapped into a semantically equivalent Pure Prolog program. This means that the semantic and proof-theoretic results of Pure Prolog can be mapped into similar results for the Datalog extension.

Updating databases in the weak instance model

Database updates have recently received much more attention than in the past. In this trend, a solid foundation is provided to the problem of updating databases through interfaces based on the weak instance model. Insertions and deletions of tuples are considered. As a preliminary tool, a lattice on states is defined, based on the information content of the various states. Potential results of an insertion are states that contain at least the information in the original state and that in the new tuple. Sometimes there is no potential result, and in the other cases there may be many of them. We argue that the insertion is deterministic if the state that contains the information common to all the potential results (the greatest lower bound, in the lattice framework) is itself a potential result. Effective characterizations for the various cases exist. A symmetric approach is followed for deletions, with fewer cases, since there are always potential results; determinism is characterized consequently.

Attribute agreement

Can constant-time-maintainability be more practical?

Practical algorithms for finding prime attributes and testing normal forms

Several decision problems for relational schemas with functional dependencies are computationally hard. Such problems include determining whether an attribute is prime and testing if a schema is in normal form. Algorithms for these problems are needed in database design tools. The problems can be solved by trivial exponential algorithms. Although the size of the instance is usually given by the number of attributes and hence is fairly small, such exponential algorithms are not usable for all design tasks. We give algorithms for these problems whose running time is polynomial in the number of maximal sets not determining an attribute or, equivalently, the number of generators of the family of closed attribute sets. There is theoretical and practical evidence that this quantity is small for the schemas occurring in practice and exponential only for pathological schemas. The algorithms are simple to implement and fast in practice. They are in use in the relational database design tool Design-By-Example.

A decision procedure for conjunctive query disjointness

This paper presents an algorithm that decides whether two conjunctive query expressions always describe disjoint sets of tuples. The decision procedure solves an open problem identified by Blakeley, Coburn, and Larson: how to check whether an explicitly stored view relation must be recomputed after an update, taking into account functional dependencies. For nonconjunctive queries, the disjointness problem is NP-hard. For conjunctive queries, the time complexity of the algorithm given cannot be improved unless the reachability problem for directed graphs can be solved in sublinear time. The algorithm is novel in that it combines separate decision procedures for the theory of functional dependencies and for the theory of dense orders. Also, it uses tableaux that are capable of representing all six comparison operators <, ≤, =, ≥, >, and ≠.

Bottom-up beats top-down for datalog

We show that for any safe datalog program P1 and any query Q (predicate of P1 with some bound arguments), there is another safe datalog program P2 that produces the answer to Q and takes no more time when evaluated by semi-naive evaluation than when P1 is evaluated topdown.

On the power of Alexander templates

Safety of datalog queries over infinite databases

A query is safe with respect to a set of constraints if for every database that satisfies the constraints the query is guaranteed to yield a finite set of answers. We study here the safety problem for Datalog programs with respect to finiteness constraints. We show that safety can be viewed as a combination of two properties: weak safety, which guarantees the finiteness of intermediate answers, and termination, which guarantees the finiteness of the evaluation. We prove that while weak safety is decidable, termination is not. We then consider monadic programs, i.e., programs in which all intensional predicates are monadic, and show that safety is decidable in polynomial time for monadic programs. While we do not settle the safety problem, we show that a closely related problem, the decision problem for safety with respect to functional dependencies, is undecidable even for monadic programs.

Proof-tree transformation theorems and their applications

For certain sets of logical rules, one can demonstrate that for every proof tree there is another tree proving the same fact and having a special form. One technique for detecting such opportunities is to reduce the question to one of conjunctive-query containment. A more powerful technique is to test whether one conjunctive query is contained in the infinite union of conjunctive queries formed by expanding a set of recursive rules. We discuss two applications of these techniques. First, we give tests for commutativity of linear rules. When linear rules commute, we can reduce the complexity of “counting” methods for query evaluation from exponential to polynomial; commutativity also implies separability in the sense of Naughton. A second application is the discovery of linear rules that are equivalent to given nonlinear rules.

Linearising nonlinear recursions in polynomial time

The replacement of nonlinear recursions with equivalent linear recursions is a potentially useful query optimization strategy, since it permits the use of efficient algorithms for the evaluation of linear logic programs. We show that a member of a certain class of bilinear recursions is linearizable in a strong sense if and only if a specific partial proof tree derived from this recursion is contained in a bounded number of partial proof trees generated by the recursion. Further, while each such test of containment between proof trees involves an exponential number of conjunctive-query containment tests, we present syntactic conditions on the recursion that are necessary and sufficient for the containment and verifiable in polynomial time.

Inference of monotonicity constraints in datalog programs

Datalog (i.e., function-free logic) programs with monotonicity constraints on extensional predicates are considered. A monotonicity constraint states that one argument of a predicate is always less than another argument, according to some partial order. Relations of an extensional database are required to satisfy the monotonicity constraints imposed on their predicates. More specifically, a partial order is defined on the domain (i.e., set of constants) of the database, and every tuple of each relation satisfies the monotonicity constraints imposed on its predicate. An algorithm is given for inferring all monotonicity constraints that hold in relations of the intensional database from monotonicity constraints that hold in the extensional database. A complete inference algorithm is also given for disjunctions of monotonicity and equality constraints. It is shown that the inference of monotonicity constraints in programs is a complete problem for exponential time. For linear programs, this problem is complete for polynomial space.

Why a single parallelization strategy is not enough in knowledge bases

We argue that the appropriate parallelization strategy for logic-program evaluation depends on the program being evaluated. Therefore, this paper is concerned with the issues of program-classification, and parallelization-strategies. We propose five parallelization strategies that differ based on the following criteria. Their evaluation cost, the overhead of communication and synchronization among processors, and the programs to which they are applicable. In particular, we start our study with pure-parallelization, i.e., parallelization without overhead. An interesting class-structure of logic programs is demonstrated, when considering amenability to pure-parallelization. The relationship to the NC complexity class is discussed. Then we propose strategies that do incur an overhead, but are optimal in a sense that will be precisely defined. This paper makes the initial steps towards a theory of parallel logic-programming.

Invited talk: modular architectures for distributed and databases systems

This paper describes the importance of modularity in systems and lists a number of reasons why systems will become increasingly modular. It describes two strawmen architecture models for systems and distributed databases in order to illustrate the hierarchical decomposition of complex systems. The paper also relates the systems model to the layering achieved in a few systems familiar to the author.

Clustered multiattribute hash files

Access methods for multidimensional data have attracted much research interest in recent years. In general, the data structures proposed for this problem partition the database into a set of disk pages (buckets). Access to the buckets is provided by searching a directory of some type such as a tree directory or inverted index or by computation of a multiattribute hash function. Examples of the first approach are Multidimensional B-trees[Sch82], K-D-B trees[Rob81] (see also [Sam84] for a survey of these methods) whereas multiattribute hashing methods are described for example in [Rot74],[Aho79],[Riv76] and [Ram83]. In addition, there are also hybrid methods which combine hashing with a directory of some type [Ore84],[Nie84], [Fag79]. In all the work mentioned above, the performance is measured in terms of the number of disk accesses made to retrieve the answer without distinguishing whether these are sequential or random. We argue that performance measurements must consider this factor in order to be realistic, especially in the single user environment. Some evidence to support this claim is given in [Sal88, pg. 22] with the IBM 3380 disk drive as an example. For this type of disk, a comparison between accessing m blocks randomly and accessing a contiguous cluster of m blocks is made. The results show that for m = 10, the random access is slower by a factor of about 8 than the clustered one whereas for m = 100 it is slower by a factor of 25. Another motivation for this work are optical disks. In this case, there is a big advantage in clustering since the access mechanism on many of these drives is equipped with an adjustable mirror which allows slight deflections of the laser beam. This means that it may be possible to read a complete cluster from a sequence of adjacent tracks beneath the head with a single random seek [Chri88]. Our work is inspired by an interesting recent paper [Fal86] which proposes to organize the physical layout of a multiattribute hash file by encoding record signatures using gray code rather than simple binary code. In this way neighboring buckets contain records which differ on a single bit in their signatures. It is then proved that the records which form the answer to a partial match query will tend to be contained in a smaller number of clusters as compared with the binary arrangement. It is also shown that this idea is applicable to many other multiattribute hashing schemes with a small amount of overhead. In addition, it can improve access time to directories of grid type files, extendible hashing and file methods which employ the z-ordering [Ore84].

Utilization of B-trees with inserts, deletes and modifies

The utilization of B-tree nodes determines the number of levels in the B-tree and hence its performance. Until now, the only analytical aid to the determination of a B-tree's utilization has been the analysis by Yao and related work. Yao showed that the utilization of B-tree nodes under pure inserts was 69%. We derive analytically and verify by simulation the utilization of B-tree nodes constructed from N inserts followed by M modifies (where M > N), where each modify is a delete followed by an insert. Assuming that nodes only merge when they are empty (the technique used in most database management systems), we show that the utilization is 39% as M becomes large. We extend this model to a parameterized mixture of inserts and modifies. Surprisingly, if the modifies are mixed with just 10% inserts, then the utilization is over 62%. We also calculated the probability of splitting and merging. We derive a simple rule-of-thumb that accurately calculates the probability of splitting. We present two models for computing this utilization, the more accurate of which remembers items inserted and then deleted in a node - we call such items ghosts.

Fractals for secondary key retrieval

In this paper we propose the use of fractals and especially the Hilbert curve, in order to design good distance-preserving mappings. Such mappings improve the performance of secondary-key- and spatial- access methods, where multi-dimensional points have to be stored on an 1-dimensional medium (e.g., disk). Good clustering reduces the number of disk accesses on retrieval, improving the response time. Our experiments on range queries and nearest neighbor queries showed that the proposed Hilbert curve achieves better clustering than older methods (“bit-shuffling”, or Peano curve), for every situation we tried.

Declustering using error correcting codes

The problem examined is to distribute a binary Cartesian product file on multiple disks to maximize the parallelism for partial match queries. Cartesian product files appear as a result of some secondary key access methods, such as the multiattribute hashing [10], the grid file [6] etc.. For the binary case, the problem is reduced into grouping the 2n binary strings on n bits in m groups of unsimilar strings. The main idea proposed in this paper is to group the strings such that the group forms an Error Correcting Code (ECC). This construction guarantees that the strings of a given group will have large Hamming distances, i.e., they will differ in many bit positions. Intuitively, this should result into good declustering. We briefly mention previous heuristics for declustering, we describe how exactly to build a declustering scheme using an ECC, and we prove a theorem that gives a necessary condition for our method to be optimal. Analytical results show that our method is superior to older heuristics, and that it is very close to the theoretical (non-tight) bound.

The impact of recovery on concurrency control

It is widely recognized by practitioners that concurrency control and recovery for transaction systems interact in subtle ways. In most theoretical work, however, concurrency control and recovery are treated as separate, largely independent problems. In this paper we investigate the interactions between concurrency control and recovery. We consider two general recovery methods for abstract data types, update-in-place and deferred-update. While each requires operations to conflict if they do not “commute,” the two recovery methods require subtly different notions of commutativity. We give a precise characterization of the conflict relations that work with each recovery method, and show that each permits conflict relations that the other does not. Thus, the two recovery methods place incomparable constraints on concurrency control. Our analysis applies to arbitrary abstract data types, including those with operations that may be partial or non-deterministic.

Concurrency control of nested transactions accessing B-trees

This paper presents a concurrency control algorithm for nested transactions accessing B-trees. It combines the idea of B-link tree with that of resilient 2-phase locking [Mos85b]. The I/O automaton model is used in the specification and proofs of correctness of the system. We define “strongly-serially correct” schedules and use this property as our correctness criterion.

Hypothetical datalog negation and linear recursion

This paper examines an extension of Horn logic in which rules can add entries to a database hypothetically. Several researchers have developed logical systems along these lines, but the complexity and expressibility of such logics is only now being explored. It has been shown, for instance, that the data-complexity of these logics is PSPACE-complete in the function-free, predicate case. This paper extends this line of research by developing syntactic restrictions with lower complexity. These restrictions are based on two ideas from Horn-clause logic: linear recursion and stratified negation. In particular, a notion of stratification is developed in which negation-as-failure alternates with linear recursion. The complexity of such rulebases depends on the number of layers of stratification. The result is a hierarchy of syntactic classes which corresponds exactly in the polynomial-time hierarchy of complexity classes. In particular, rulebases with k strata are data-complete for &Sgr;Ph. Furthermore, these rulebases provide a complete characterization of the relational queries in &Sgr;Ph. That is, any query whose graph is in &Sgr;Ph can be represented as a set of hypothetical rules with k strata. Unlike other expressibility results in the literature, this result does not require the data domain to be linearly ordered.

Inductive pebble games and the expressive power of datalog

As an alternative to logic-based query languages for recursive queries, we are investigating a graphical query language called G+, which allows, among other things, easy formulation of certain queries involving simple paths in directed graphs. This led us to study whether such queries are expressible in DATALOG, the language of function-free Horn clauses. Since some G+ queries are NP-hard, and all DATALOG queries are polynomial time computable, the answer appears to be negative. However, it would be interesting to have proof techniques and tools for settling such questions with certainty. The objective of this paper is the development of one such tool, inductive pebble games, based on a normal form for DATALOG programs derived here, and its relationship to Alternating Turing Machine computations. As an application, we sketch a proof that the query “find all pairs of nodes connected by a directed simple path of even length” cannot be expressed in DATALOG.

On the first-order expressibility of recursive queries

A Datalog program is bounded iff it is equivalent to a recursion-free Datalog program. We show that, for some classes of Datalog programs, expressibility in first-order query languages coincides with boundedness. Our results imply that testing first-order expressibility is undecidable for binary programs, decidable for monadic programs, and complete for &Sgr;02.

Expressibility of bounded-arity fixed-point query hierarchies

The expressibility of bounded-arity query hierarchies resulting from the extension of first-order logic by the least fixed-point, inductive fixed-point and generalized fixed-point operators is studied. In each case, it is shown that increasing the arity of the predicate variable from k to k+1 always allows some more k-ary predicates to be expressed. Further, k-ary inductive fixed-points are shown to be more expressive than k-ary least fixed-points and k-ary generalized fixed-points are shown to be more expressive than k-ary inductive fixed-points.

Relational database behavior: utilizing relational discrete event systems and models

Behavior of relational databases is studied within the framework of Relational Discrete Event Systems (RDE-Ses) and Models (RDEMs). Production system and recurrence equation RDEMs are introduced, and their expressive powers are compared. Non-deterministic behavior is defined for both RDEMs and the expressive power of deterministic and non-deterministic production rule programs is also compared. This comparison shows that non-determinism increases expressive power of production systems. A formal concept of a production system interpreter is defined, and several specific interpreters are proposed. One interpreter, called parallel deterministic, is shown to be better than others in many respects, including the conflict resolution module of OPS5.

Untyped sets, invention, and computable queries

Conventional database query languages are considered in the context of untyped sets. The algebra without while has the expressive power of the typed complex object algebra. The algebra plus while, and COL with untyped sets (under stratified semantics or inflationary semantics) have the power of the computable queries. The calculus has power beyond the computable queries; and is characterized using the typed complex object calculus with invention. The Bancilhon-Khoshafian calculus is also discussed. A technical tool, called “generic Turing machine”, is introduced and used in several of the proofs.

Modeling complex structures in object-oriented logic programming

In this paper, we present a type model for object-oriented databases. Most object-oriented databases only provide users with flat objects whose structure is a record of other objects. In order to have a powerful expression power, an object-oriented database should not only provide objects but also complex values recursively built using the set, tuple and disjunctive constructors. Our type model presents two notions: that of classes whose instances are objects with identity and that of types whose instances are complex values. The two notions are mixed in that an object is modeled as a pair containing an identifier and a value, and a value is a complex structure which contains objects and values. We define in this context the notions of subtyping and provide a set inclusion semantics for subtyping.

C-logic of complex objects

Our objective is to have a logical framework for natural representation and manipulation of complex objects. We start with an analysis of semantic modeling of complex objects, and attempt to understand what are the fundamental aspects which need to be captured. A logic, called C-logic, is then presented which provides direct support for what we believe to be basic features of complex objects, including object identity, multi-valued labels and a dynamic notion of types. C-logic has a simple first-order semantics, but it also allows natural specification of complex objects and gives us a framework for exploring efficient logic deduction over complex objects.

A logic for object-oriented logic programming

We present a logic for reasoning about complex objects, which is a revised and significantly extended version of Maier's O-logic [Mai86]. The logic naturally supports complex objects, object identity, deduction, is tolerant to inconsistent data, and has many other interesting features. It elegantly combines the object-oriented and value-oriented paradigms and, in particular, contains all of the predicate calculus as a special case. Our treatment of sets is also noteworthy: it is more general than ELPS [Kup87] and COL [AbG87], yet it avoids the semantic problems encountered in LDL [BNS87]. The proposed logic has a sound and complete resolution-based proof procedure.

Type systems for querying class hierarchies with non-strict inheritance

Type checking at query compilation time is important for both detecting programmer errors and reducing the running time of queries. We have argued elsewhere [2] that entity-based data management systems which support class hierarchies, such as semantic data models and object-oriented dbms, should not be confined to have “strict inheritance” — i.e., they should permit contradictions between class specifications, albeit in an explicit and controlled way. In this paper we present a type system for queries manipulating objects in such classes. We provide sound and complete axiomatizations of the predications “&sgr; is a subtype of &tgr;” and “expression e has type &tgr;”. The absence of strict inheritance has normally been felt to preclude effective type checking. We show that the problem is co-NP-hard when disjoint types are admitted in the schema, but present a low-order polynomial-time algorithm that determines the absence of type errors in a query when the database has only entities.