PODS Keynote Talk: Principles of Dataspace Systems
Alon Halevy, University of Washington, Seattle and Google
Monday, 8:15 - 9:30
Location: Grand Ballroom 1-3
The most acute information management challenges today stem from organizations relying on a large number of diverse, interrelated data sources, but having no means of managing them in a convenient, integrated, or principled fashion. These challenges arise in enterprise and government data management, digital libraries, "smart" homes and personal information management. We have proposed dataspaces as a data management abstraction for these diverse applications and DataSpace Support Platforms (DSSPs) as systems that should be built to provide the required services over dataspaces. Unlike data integration systems, DSSPs do not require full semantic integration of the sources in order to provide useful services. This paper lays out specific technical challenges to realizing DSSPs and ties them to existing work in our field. We focus on query answering in DSSPs, the DSSP's ability to introspect on its content, and the use of human attention to enhance the semantic relationships in a dataspace.
Alon Halevy is currently at Google and on leave from the University of Washington in Seattle. His research interests are in data integration, semantic heterogeneity, personal information management, XML, and more generally, interactions between Artificial Intelligence and data management. In 1999, Dr. Halevy co-founded Nimble Technology, one of the first companies in the Enterprise Information Integration space. In 2004, Dr. Halevy founded Transformic Inc., a company that created search engines for the deep web, content residing in databases behind web forms. Dr. Halevy was a Sloan Fellow (1999-2000), and received the Presidential Early Career Award for Scientists and Engineers (PECASE) in 2000. He serves on the editorial boards of the VLDB Journal and on the Advisory Committee of the Journal of Artificial Intelligence Research and served at the Program Chair for SIGMOD 2003. Prior to joining the University of Washington in 1998, Dr. Halevy was a principal member of technical staff at AT&T Bell Laboratories, and AT&T Laboratories. He received his Ph.D in Computer Science from Stanford University in 1993.
PODS Invited Tutorial 1: From Statistical Knowledge Bases to Degrees of Belief
Joseph Y. Halpern, Cornell University
Monday 15:45 - 16:45
Location: Grand Ballroom 1-3
An intelligent agent will often be uncertain about various properties of its environment, and when acting in that environment it will frequently need to quantify its uncertainty. For example, if the agent wishes to employ the expected-utility paradigm of decision theory to guide its actions, she will need to assign degrees of belief (subjective probabilities) to various assertions. Of course, these degrees of belief should not be arbitrary, but rather should be based on the information available to the agent. This paper provides a brief overview of one approach for inducing degrees of belief from very rich knowledge bases that can include information about particular individuals, statistical correlations, physical laws, and default rules. The approach is called the random-worlds method. The method is based on the principle of indifference: it treats all of the worlds the agent considers possible as being equally likely. It is able to integrate qualitative default reasoning with quantitative probabilistic reasoning by providing a language in which both types of information can be easily expressed. A number of desiderata that arise in direct inference (reasoning from statistical information to conclusions about individuals) and default reasoning follow directly from the semantics of random worlds. For example, random worlds captures important patterns of reasoning such as specificity, inheritance, indifference to irrelevant information, and default assumptions of independence. Furthermore, the expressive power of the language used and the intuitive semantics of random worlds allow the method to deal with problems that are beyond the scope of many other non-deductive reasoning systems. The relevance of the random-worlds method to database systems is also discussed.
Joseph Y. Halpern received a B.Sc. in mathematics from the University of Toronto in 1975 and a Ph.D. in mathematics from Harvard in 1981. In between, he spent two years as the head of the Mathematics Department at Bawku Secondary School, in Ghana. He is currently a professor of computer science at Cornell University, where he moved in 1996 after spending 14 years at the IBM Almaden Research Center. His interests include reasoning about knowledge and uncertainty, decision theory and game theory, fault-tolerant distributed computing, causality, and security. Together with his former student, Yoram Moses, he pioneered the approach of applying reasoning about knowledge to analyzing distributed protocols and multi-agent systems; he won a Godel Prize for this work. He received the Publishers' Prize for Best Paper at at the International Joint Conference on Artificial Intelligence in 1985 (joint with Ronald Fagin) and in 1989. He has coauthored 6 patents, two books ("Reasoning About Knowledge" and "Reasoning About Uncertainty"), over 100 journal publications, and over 140 conference publications. He is a former editor-in-chief of the Journal of the ACM, a Fellow of the ACM, the AAAI, and the AAAS, and was the recipient of a Guggenheim and a Fulbright Fellowship.
PODS Invited Tutorial 2: Processing Queries on Tree-Structured Data Efficiently
Christoph Koch, Saarland University
Tuesday 15:30 - 16:30
Location: Grand Ballroom 6
This is a survey of algorithms, complexity results, and general solution techniques for efficiently processing queries on tree-structured data. I focus on query languages that compute nodes or tuples of nodes-conjunctive queries, first-order queries, datalog, and XPath. I also point out a number of connections among previous results that have not been observed before.
Christoph Koch did his doctoral research at CERN, a high energy physics research laboratory near Geneva, Switzerland, and received his PhD in 2001. He was a postdoctoral researcher first at TU Vienna and later at at the University of Edinburgh. From 2003 to 2005 he was on the faculty of TU Vienna. Since April 2005 he is professor (W2) and Chair of Information Systems at Saarland University, Saarbruecken, Germany. He has authored or co-authored about 50 publications in the areas of data management and Artificial Intelligence, two of which won best paper awards at PODS 2002 and ICALP 2005. He co-chaired DBPL 2005 and is on the editorial board of ACM Transactions on Internet Technology. His current research interests are in database systems and database theory, in particular in queries on tree-structured data, XML, data stream processing, managing incomplete information, efficient query evaluation and query optimization, visual query languages, and scientific databases.
PODS Invited Tutorial 3: The Logic of the Semantic Web
Enrico Franconi, Free University of Bozen-Bolzano
Wednesday 16:00 - 17:00
The Resource Description Framework (RDF [Hayes, 2004]) is a W3C standard language for representing information about resources in the World Wide Web; RDF provides a common framework for expressing this information so it can be exchanged between applications without loss of meaning. In this tutorial, RDF will be presented as a data model in the database sense. Its motivations will be analysed, and its current formal status revised. The data model can be understood both from a graph theoretical perspective and from a logical perspective. While the former has been the focus of most theoretical (see, e.g., [Gutierrez et al., 2004]) and practical approaches to RDF, the logical view of RDF has been mostly neglected by the community so far. Two provably correct (w.r.t. the normative W3C definitions of RDF [Hayes, 2004]) logical reconstructions of RDF will be presented, by reducing (a fragment of) it to a classical first-order framework suitable for knowledge representation (first developed in [de Bruijn et al., 2005]), and by encoding the full RDF data model in the HiLog logic introduced by Kifer et al. several years ago [Chen et al., 1993]. An emphasis will be given to three main characteristics of RDF: the presence of anonymous bnodes, the non-well-foundedness of the basic rdf:type relation, and the presence of the RDF vocabulary in the mode itself. In the second part of the tutorial, the relation of the logical reconstructions of RDF with a database perspective will be introduced. An RDF database is seen as a model of a suitable theory in first order logic or in HiLog. While in the pure RDF sense the two approaches are equivalent, it will be shown how the difference becomes relevant whenever additional constraints (e.g., in the form of ontologies or database dependencies) are introduced in the framework. In order to allow for additional constraints (e.g., in the standard W3C OWL-DL ontology language [Patel-Schneider et al., 2004]) while keeping the framework first order, only a fragment of RDF can be considered; this restriction is not needed if the framework is in HiLog (see, e.g., [Motik, 2005]). Various complexity and decidability results will be summarised. In the last part of the tutorial, the W3C standard query language for RDF (SPARQL [Prud'hommeaux and Seaborne, 2006]) will be presented. SPARQL is currently a candidate recommendation. The core of SPARQL is a conjunctive query language, with the added complication that the data model includes existential information in the form of bnodes, and that bnodes may be returned by the query. The formal semantics of the core query language will be given. The problem of the canonical representation of the answer set will be introduced, since bnodes introduce a behaviour similar to the null values in SQL. Complexity results for query answering will be given for different cases. Finally, the possible extensions of SPARQL with various classes of constraints will be discussed.
Enrico Franconi is associate professor at the Free University of Bozen-Bolzano, Italy. He is currently principal investigator of the European network of excellence "Realizing the Semantic Web" (KnowledgeWeb); of the European network of excellence "Interoperability Research for Networked Enterprises Applications and Software" (InterOp); and he is co-investigator of the European basic research project "Thinking Ontologies" (Tones). He is member of the Advisory Committee of the World Wide Web Consortium (W3C), of the editorial board of the Journal of Applied Logic (Elsevier), and of the editorial board of the AAAI Press (the American Association of Artificial Intelligence). His main research interest is on knowledge representation and reasoning technologies applied to databases - and in particular description logics, temporal representations, conceptual modelling, intelligent access to information, information integration, natural language semantics.