Search SIGMOD Join SIGMOD Feedback What's New Home
In cooperation with

ACM SIGMOD/PODS 2004 Conference

Maison de la Chimie, Paris, France.
June 13-18, 2004

SIGMOD Program

At a glance

Tuesday 09:00
Research Track
Opening and Keynote Break
Stream Management Lunch Web, XML, and IR Break New Styles of XML
Research Track
XML Query Efficiency Data Mining Applications Statistics
Research & Industrial Track Industrial: Query Processing
Non-Standard Query Processing
Indexing & Tuning
Tutorial & Panel

Tutorial: Security

Tutorial: Security (continued)
Group A: Web Services, Data Integration, Data Mining
Group B: Streams, Peer-to-peer and distributed databases
Group C: XML, Data Privacy, Potpourri
Wednesday 08:30
Research Track
Data Integration Break
XML PubSub & Indexing Lunch
SIGMOD Business Meeting and Awards Ceremony Security & Privacy Break
Query Optimization Conference Banquet
Research Track
Stream Query Processing P2P and Networks Moving Objects Spatial Data
 Industrial Track Database Run-Time System DB Applications Web Services Information Assurance
Research Track & Tutorials Clustering Tutorial: Time Series
Tutorial: Streams Tutorial: Streams (continued)
Group C: XML, Data Privacy, Potpourri
Group A: Web Services, Data Integration, Data Mining
Group B: Streams, Peer-to-peer and distributed databases
Thursday 08:45
14:00 - ...
Research Track Keynote Break Schema Discovery Break Query Progress Joint Workshops
Research Track Query Uncertainty Consistency & Availability
 Research & Industrial Track Text and DB Industrial: XML & RDBMS Marriage
Tutorials Tutorial: Web Services Tutorial: Web Services (continued)

Tuesday, June 15   ^

09:00 - 10:30 Opening and Keynote Chair: Gerhard Weikum

A Baker's Dozen Revolutions in Database System Architecture
Jim Gray

10:30 - 11:00

11:00 - 12:30 Research Session 1: Stream Management Chair: Edward Chang

Adaptive Stream Resource Management Using Kalman Filters
Ankur Jain, Edward Y. Chang, Yuan-Fang Wang

Online Event-driven Subsequence Matching over Financial Data Streams
Huanmei Wu, Betty Salzberg, Donghui Zhang    

Holistic UDAFs at streaming speeds
Graham Cormode, Theodore Johnson, Flip Korn, S. Muthukrishnan, Oliver Spatscheck, Divesh Srivastava

11:00 - 12:30 Research Session 2: XML Query Efficiency Chair: Sihem Amer-Yahia

BLAS: An Efficient XPath Processing System
Yi Chen, Susan B. Davidson, Yifeng Zheng

Efficient Processing of Twig Queries with OR-Predicates
Haifeng Jiang, Hongjun Lu, Wei Wang

Tree Logical Classes for Efficient Evaluation of XQuery
Stelios Paparizos, Yuqing Wu, Laks V. S. Lakshmanan, H. V. Jagadish

11:00 - 12:30 Industrial Session 1: Query Processing
Chair: Guy Lohman

Query Sampling in DB2 Universal Database
Jarek Gryz, Junjie Guo --- York University
Linqi Liu, Calisto Zuzarte --- IBM Toronto

Query Processing for SQL Updates
Cesar A. Galindo-Legaria, Stefano Stefani, Florian Wass --- Microsoft Corporation, USA

Parallel SQL Execution in Oracle 10g
Thierry Cruanes, Benoit Dageville, Bhaskar Ghosh --- Oracle Corporation, USA

11:00 - 12:30 Tutorial Session 1: Security

Security of Shared Data in Large Systems: State of the Art and Research Directions
Arnon Rosenthal, Marianne Winslett

12:00 - 14:00 Lunch

14:00 - 16:00 Research Session 3: Web, XML, and IR Chair: Catriel Beeri

FleXPath: Flexible Structure and Full-Text Querying for XML
Sihem Amer-Yahia, Laks V. S. Lakshmanan, Shashank Pandit

An Interactive Clustering-based Approach to Integrating Source Query interfaces on the Deep Web
Wensheng Wu, Clement Yu, AnHai Doan, Weiyi Meng

Understanding Web Query Interfaces: Best-Effort Parsing with Hidden Syntax
Zhen Zhang, Bin He, Kevin Chen-Chuan Chang

Using the Structure of Web Sites for Automatic Segmentation of Tables
Kristina Lerman, Lise Getoor, Steven Minton, Craig Knoblock

14:00 - 16:00 Research Session 4: Data Mining Applications Chair: Cyrus Shahabi

Identifying Similarities, Periodicities and Bursts for Online Search Queries
Michail Vlachos, Chris Meek, Zografoula Vagena, Dimitrios Gunopulos

FARMER: Finding Interesting Rule Groups in Microarray Datasets
Gao Cong, Anthony K. H. Tung, Xin Xu, Feng Pan, Jiong Yang    

Diamond in the Rough: Finding Hierarchical Heavy Hitters in Multi-Dimensional Data
Graham Cormode, Flip Korn, S. Muthukrishnan, Divesh Srivastava    

Cost-Based Labeling of Groups of Mass Spectra
Lei Chen, Zheng Huang, Raghu Ramakrishnan

14:00 - 16:00 Research Session 5: Non-standard Query Processing Chair: Paul Larson

Optimization of Query Streams Using Semantic Prefetching
Ivan J. Bowman, Kenneth Salem     

Buffering Database Operations for Enhanced Instruction Cache Performance
Jingren Zhou, Ken Ross

Rank-aware Query Optimization
Ihab F. Ilyas, Rahul Shah, Walid G. Aref, Jeffrey Scott Vitter, Ahmed K. Elmagarmid     

Fast Computation of Database Operations using Graphics Processors
Naga K. Govindaraju, Brandon Lloyd, Wei Wang, Ming Lin, Dinesh Manocha

14:00 - 16:00 Tutorial Session 2: Security (continued)

Security of Shared Data in Large Systems: State of the Art and Research Directions
Arnon Rosenthal, Marianne Winslett

16:00 - 16:30

16:30 - 18:30
Research Session 6: New Styles of XML
Chair: Mary Fernandez

Lazy Query Evaluation for Active XML
Serge Abiteboul, Omar Benjelloun, Bogdan Cautis, Ioana Manolescu, Tova Milo, Nicoleta Preda 

Data Stream Management for Historical XML Data
Sujoe Bose, Leonidas Fegaras    

Colorful XML: One Hierarchy Isn't Enough
H. V. Jagadish, Laks V. S. Lakshmanan, Monica Scannapieco, Divesh Srivastava, Nuwee Wiwatwattana    

Approximate XML Query Answers
Neoklis Polyzotis, Minos Garofalakis, Yannis Ioannidis

16:30 - 18:30
Research Session 7: Statistics
Chair: Volker Markl

A Bi-Level Bernoulli Scheme for Database Sampling
Peter J. Haas, Christian Koenig  

Effective Use of Block-Level Sampling in Statistics Estimation
Surajit Chaudhuri, Gautam Das, Utkarsh Srivastava 

Online Maintenance of Very Large Random Samples
Christopher Jermaine, Abhijit Pol, Subramanian Arumugam

Conditional Selectivity for Statistics on Query Expressions
Nicolas Bruno, Surajit Chaudhuri

16:30 - 18:30
Research Session 8: Indexing and Tuning
Chair: Patrick O'Neil

Transaction support for indexed views
Goetz Graefe, Michael Zwilling

Graph Indexing: A Frequent Structure-based Approach
Xifeng Yan, Philip S. Yu, Jiawei Han    

The Priority R-Tree: A Practically Efficient and Worst-Case Optimal R-Tree
Lars Arge, Mark de Berg, Herman J. Haverkort, Ke Yi   

Integrating Vertical and Horizontal Partitioning Into Automated Physical Database Design
Sanjay Agrawal, Vivek Narasayya, Beverly Yang

16:30 - 18:30 Panel: Rethinking the Conference Reviewing Process

Michael Franklin (SIGMOD 02 PC chair), co-moderator
Jennifer Widom (SIGMOD 05 PC chair), co-moderator
Gerhard Weikum (SIGMOD 04 PC chair)
Alon Halevy (SIGMOD 03 PC chair)
Philip A. Bernstein (SIGMOD 79 PC chair: the long-term view)
David DeWitt (SIGMOD 83 PC chair: the long-term view)
Anastassia Ailamaki (the near-term view)
Zack Ives (the near-term view)

Wednesday, June 16   ^

08:30 - 10:00 Research Session 9: Data Integration Chair: Bertram Ludaescher

Constraint-Based XML Query Rewriting For Data Integration
Cong Yu, Lucian Popa    

iMAP: Discovering Complex Mappings between Database Schemas
Robin Dhamankar, Yoonkyong Lee, AnHai Doan, Alon Halevy, Pedro Domingos 

Adapting to Source Properties in Processing Data Integration Queries
Zachary G. Ives, Alon Halevy, Daniel S. Weld

08:30 - 10:00 Research Session 10: Stream Query Processing Chair: Ioana Manolescu

Adaptive Ordering of Pipelined Stream Filters
Shivnath Babu, Rajeev Motwani, Kamesh Munagala, Itaru Nishizawa, Jennifer Widom  

Static Optimization of Conjunctive Queries with Sliding Windows Over Infinite Streams
Ahmed M. Ayad, Jeffrey F. Naughton   

Dynamic Plan Migration for Continuous Queries Over Data Streams
Yali Zhu, Elke A. Rundensteiner, George T. Heineman

08:30 - 10:00 Industrial Session 2: Database Run-Time System Chair: Anil Nori

Data Densification in a Relational Database System
Abhinav Gupta, Sankar Subramanian, Srikanth Bellamkonda, Tolga Bozkaya, Nathan Folkert, Lei Sheng, Andrew Witkowski --- Oracle Corporation, USA

Hosting the .NET Runtime in Microsoft SQL Server
Alazel Acheson, Mason Bendixen, José A. Blakeley, Peter Carlin, Ebru Ersan, Jun Fang, Xiaowei Jiang, Christian Kleinerman, Balaji Rathakrishnan, Gideon Schaller, Beysim Sezgin, Ramachandran Venkatesh, Honggang Zhang --- Microsoft Corp, USA

Vertical and Horizontal Percentage Aggregations
Carlos Ordonez --- Teradata, NCR, USA

08:30 - 10:00 Research Session 11: Clustering Chair: Marie-Christine Rousset

Clustering Objects on a Spatial Network
Man Lung Yiu, Nikos Mamoulis  

Computing Clusters of Correlation Connected Objects
Christian Boehm, Karin Kailing, Peer Kroeger, Arthur Zimek

Incremental and Effective Data Summarization for Dynamic Hierarchical Clustering
Samer Nassar, Joerg Sander, Corrine Cheng

10:00 - 10:30

10:30 - 12:00
Research Session 12: XML PubSub and Indexing Chair: Peter Apers

Implementing a Scalable XML Publish/Subscribe System using Relational Database Systems
Feng Tian, Berthold Reinwald, Hamid Pirahesh, Tobias Mayr, Jussi Myllymaki 

Incremental Maintenance of XML Structural Indexes
Ke Yi, Hao He, Ioana Stanoi, Jun Yang 

Incremental Evaluation of Schema-Directed XML Publishing
Philip Bohannon, Peter Buneman, Byron Choi, Wenfei Fan

10:30 - 12:00
Research Session 13: Peer-to-Peer and Sensor Networks Chair: Laurent Amsaleg

The Price of Validity in Dynamic Networks
Mayank Bawa, Aristides Gionis, Hector Garcia-Molina, Rajeev Motwani

Compressing Historical Information in Sensor Networks
Antonios Deligiannakis, Yannis Kotidis, Nick Roussopoulos    

Efficient Query Reformulation in Peer-Data Management Systems
Igor Tatarinov, Alon Halevy

10:30 - 12:00
Industrial Session 3: Web Services Chair: Roger Barga

Models for Web Services Transactions
Mark Little --- Chief Architect Arjun Technologies, UK

Enabling Sovereign Information Sharing Using Web Services
Rakesh Agrawal, Dmitri Asonov, Ramakrishnan Srikant --- IBM Almaden

Building Dynamic Application Networks with Web Services
Matt Mihic --- BEA

Secure, Reliable, Transacted: Innovation in Web Services
Martin Gudgin --- Microsoft Corporation

10:30 - 12:00
Tutorial Session 3: Time Series

Fast Algorithms for Time Series: algorithms and applications to Finance, Physics, Music, Biology and other Suspects
Alberto Lerner, Dennis Shasha, Zhihua Wang, Xiaojian Zhao, Yunyue Zhu

12:00 - 13:30 Lunch

13:30 - 14:30 SIGMOD Business Meeting and Awards Ceremony

14:30 - 16:30
Research Session 14: Security and privacy Chair: Arnon Rosenthal

Extending Query Rewriting Techniques for Fine-Grained Access Control
Shariq Rizvi, Alberto Mendelzon, S. Sudarshan, Prasan Roy    

Order-Preserving Encryption for Numeric Data
Rakesh Agrawal, Jerry Kiernan, Ramakrishnan Srikant, Yirong Xu    

A Formal Analysis of Information Disclosure in Data Exchange
Gerome Miklau, Dan Suciu    

Secure XML Querying with Security Views
Wenfei Fan, Chee-Yong Chan, Minos Garofalakis

14:30 - 16:30
Research Session 15: Moving Objects Chair: Jan Chomicki

Indexing Spatio-Temporal Trajectories with Chebyshev Polynomials
Raymond Ng, Yuhan Cai   

Prediction and Indexing of Moving Objects with Unknown Motion Patterns
Yufei Tao, Christos Faloutsos, Dimitris Papadias, Bin Liu    

SINA: Scalable Incremental Processing of Continuous Queries in Spatio-temporal Databases
Mohamed F. Mokbel, Xiaopeng Xiong, Walid Aref

STRIPES: An Efficient Index for Predicted Trajectories
Jignesh M. Patel, Yun Chen, V. Prasad Chakka

14:30 - 16:30 Industrial Session 4: Database Applications Chair: Gustavo Alonso

SoundCompass: A Practical Query-by-Humming System
Naoko Kosugi, Yasushi Sakurai, Masashi Morimoto --- NTT Cyber Space Labs, Yokosuka, Kanagawa, Japan

Model driven Business UI based on Maps
Per Bendsen --- Microsoft Corp, Copenhagen

dbSwitch - Towards a Database Utility
Shaul Dar --- Savantis Systems Ltd, Herzelia, Israel
Gil Hecht, Eden Shochat --- Savantis Systems Inc, Lexington, MA USA

14:30 - 16:30 Tutorial Session 4: Streams

Indexing and Mining  Streams
Christos Faloutsos

16:30 - 17:00

17:00 - 19:00
Research Session 16: Query Optimization Chair: Zachary Ives

CORDS: Automatic Discovery of Correlations and Soft Functional Dependencies
Ihab F. Ilyas, Volker Markl, Peter Haas, Paul Brown, Ashraf Aboulnaga  

Robust Query Processing through Progressive Optimization
Volker Markl, Vijayshankar Raman, David Simmen, Guy Lohman, Hamid Pirahesh, Miso Cilimdzic

Canonical Abstraction for Outerjoin Optimization
Jun Rao, Hamid Pirahesh, Calisto Zuzarte

17:00 - 19:00
Research Session 17: Spatial Data Chair:Jan Paredaens

Joining Interval Data in Relational Databases
Jost Enderle, Matthias Hampel, Thomas Seidl   

Approximation Techniques for Spatial Data
Abhinandan Das, Johannes Gehrke, Mirek Riedewald   

Spatially-decaying Aggregation over a Network: Model and Algorithms
Edith Cohen, Haim Kaplan

17:00 - 19:00 Industrial Session 5: Information Assurance Challenges Chair: Pam Drew

Requirements and Policy Challenges in Highly Secure Environments
Dean E. Hall --- Information Assurance Section Chief, Federal Bureau of Investigation

Information Assurance Technology Challenges
Nicholas J. Multari --- Senior Manager, Information Assurance, Boeing Phantom Works

Service-Oriented BI: Towards tight integration of business intelligence into operational applications
Marcus Dill, Achim Kraiss, Stefan Sigg, Thomas Zurek --- SAP

17:00 - 19:00 Tutorial Session 5: Streams (continued)

Indexing and Mining  Streams
Christos Faloutsos

20:30 Banquet

Thursday, June 17   ^

08:45 - 10:00 Keynote
Chair: Gerhard Weikum

The roles of cryptography in database security
Ueli Maurer

16:30 - 17:00

10:30 - 11:30
Research Session 18: Schema Discovery Chair: Sonia Bergamaschi

TOSS: An Extension of TAX with Ontologies and Similarity Queries
Edward Hung, Yu Deng, V.S. Subrahmanian  

Information-Theoretic Tools for Mining Database Structure from Large Data Sets
Periklis Andritsos, Renee Miller, Panayiotis Tsaparas

10:30 - 11:30 Research Session 19: Query Uncertainty Chair: Mirek Riedewald

Efficient set joins on similarity predicates
Sunita Sarawagi , Alok Kirpal

Automatic Categorization of Query Results
Kaushik Chakrabarti, Surajit Chaudhuri, Seung-won Hwang

10:30 - 11:30 Research Session 20: Text and DB Chair: Laks Lakshmanan

When one Sample is not Enough: Improving Text Database Selection Using Shrinkage
Panagiotis G. Ipeirotis, Luis Gravano    

On the Integration of Structure Indexes and Inverted Lists
Raghav Kaushik, Rajasekar Krishnamurthy, Jeffrey F. Naughton, Raghu Ramakrishnan

10:30 - 11:30 Tutorial Session 6: Web Services

Tools for Design of Composite Web Services
Richard Hull, Jianwen Su

11:30 - 12:00 Break

12:00 - 13:00 Research Session 21: Query Progress Chair: Peter Boncz

Toward a Progress Indicator for Database Queries
Gang Luo, Jeffrey F. Naughton, Curt J. Ellmann, Michael W. Watzke   

Estimating Progress of Long Running SQL Queries
Surajit Chaudhuri, Vivek Narasayya, Ravishankar Ramamurthy

12:00 - 13:00 Research Session 22: Consistency and Availability Chair: Heiko Schuldt

Relaxed Currency and Consistency: How to Say "Good Enough" in SQL
Hongfei Guo, Per-AkeLarson, Raghu Ramakrishnan, Jonathan Goldstein    

Highly-Available, Fault-Tolerant, Parallel Dataflows
Mehul A. Shah, Joseph M. Hellerstein, Eric Brewer

12:00 - 13:00 Industrial Session 6: The Marriage of XML and Relational Databases Chair: Dan Florescu

XML in the Middle: XQuery in the WebLogic Platform
Michael J. Carey --- BEA

ORDPATHs: Insert-Friendly XML Node Labels
Patrick O'Neil, Elizabeth O'Neil --- University of Massachusetts Boston
Shankar Pal, Istvan Cseri, Gideon Schaller, Nigel Westbury --- Microsoft

12:00 - 13:00 Tutorial Session 7: Web Services (continued)

Tools for Design of Composite Web Services
Richard Hull, Jianwen Su

Keynote, Tuesday, June 15   ^

A Baker's Dozen Revolutions in Database System Architecture

Jim Gray (Microsoft Research, USA)


Database system architectures are undergoing revolutionary changes. 

  1. At the core, we are finally unifying algorithms and data by integrating programming languages (a common language runtime or virtual machine) with the database system. This gives a full-blown object-relational system where relational operators manipulate sets of objects in non-procedural ways.
  2. Coupled with this, each DBMS is now a web service: it is listening to port 80, publishing WSDL, and servicing  SOAP calls.  This has huge implications for how we structure applications. DBMSs are now object containers.
  3. The first service one exports is a queue mechanism that allows asynchronous requests and is the basis for transaction processing and workflow applications. Future workflow systems are likely to be built on this core.
  4. Data cubes and online analytic processing have been baked into most systems so that they seamlessly combine ROLAP and OLAP.
  5. Beyond that, the systems have built a framework for data mining and machine learning algorithms. Decision trees, Bayes nets, and  clustering are built in, and new algorithms can be added to this framework by third parties.
  6. Traditionally database systems were row stores.  But many applications have very sparse tables. So there is a rebirth of column stores that offer huge performance advantages for some applications.  
  7. To complete the puzzle, text, temporal, and spatial data access methods, along with their probabilistic reasoning have been added to the DBMSs.  This makes them much more useful forbroad application classes.
  8. External data is increasingly arriving as streams that need to be compared to a historical data - and stream-processing operators are being added to the xecution engines.
  9. Publish-subscribe systems invert the data-query ratios, in those systems the data is compared against millions of queries rather than queries searching millions of records.
  10. Meanwhile, disks and main memory are growing much faster than the bandwidth and latency between them, so the database systems are increasingly using multi-terabyte main memories and bulk sequential access to disk.
  11. All these changes mandate that we rethink our query optimization strategy to be much more dynamic -- adapting to current conditions and selectivities rather than picking a static plan in advance.
  12. Intelligence is moving to the periphery of the network.  Each disk and each sensor will have a fairly competent database machine. There is considerable evidence that relational algebra is the most convenient way to program these systems.
  13. And, database systems are now expected to be self-managing, self-healing, and always-up. As you can see, we have our work cut out for us in delivering all these features, but database systems are central to new applications.


Jim Gray is part of Microsoft's research group. His work focuses on databases and transaction processing. Jim is active in the research community, is an ACM, NAE, NAS, and AAAS Fellow, and received the ACM Turing Award for his work on transaction processing. He edits of a series of books on data management, and has been active in building online databases like http://terraService.Net and

Tutorial 1 & 2, Tuesday, June 15   ^

Security of Shared Data in Large Systems: State of the Art and Research Directions

Arnon Rosenthal (MITRE), Marianne Winslett (University of Illinois)


The target audience for this tutorial is the entire SIGMOD research community. The goals of the tutorial are to enlighten the SIGMOD research community about the state of the art in data security, especially for enterprise or larger systems, and to engage the community’s interest in improving the state of the art.

Detailed introduction: RosenthalWinslett-tutorial-abstract.pdf.


Arnon Rosenthal is a Principal Scientist at MITRE.  He has broad interests in problems that arise when data is shared between communities, including a long-standing interest in the security issues that arise in data warehouses, federated databases, and enterprise information systems.  He has also had a first-hand look at many security problems that arise in large government
and military organizations. 

Marianne Winslett has been a professor at the University of Illinois since 1987.  She started working on database security issues in the early 1990s, focusing on semantic issues in MLS databases.  Her interests soon shifted to issues of trust management for data on the web. Trust negotiation is her main current research focus.

Panel, Tuesday, June 15   ^

Rethinking the Conference Reviewing Process

Moderators: Michael J. Franklin (UC Berkeley) and Jennifer Widom (Stanford)

Panelists: Anastassia Ailamaki (Carnegie Mellon), Philip A. Bernstein (Microsoft), David DeWitt (Wisconsin), Alon Halevy (Washington), Zachary Ives (U Penn), and Gerhard Weikum (Max Planck Institute)

Statement is provided as panelcamera.pdf.

Tutorial 3, Wednesday, June 16   ^

Fast Algorithms for Time Series with applications to Finance, Physics, Music, Biology, and other Suspects
(Work supported in part by U.S. NSF grants IIS-9988636 and N2010-0115586)

Alberto Lerner (IBM TJ Watson Research Center), Dennis Shasha, Zhihua Wang, Xiaojian Zhao, Yunyue Zhu (New York University)


Financial time series streams are watched closely by millions of traders. What exactly do they look for and how can we help them do it faster? Physicists study the time series emerging from their sensors. The same question holds for them. Musicians produce time series. Consumers may want to compare them. This tutorial presents techniques and case studies for four problems:
  1. Finding sliding window correlations in financial, physical, and other applications.
  2. Discovering bursts in large sensor data of gamma rays.
  3. Matching hums to recorded music, even when people don't hum well.
  4. Maintaining and manipulating time-ordered data in a database setting.
This tutorial draws mostly from the book High Performance Discovery in Time Series: techniques and case studies, Springer-Verlag 2004. You can find the power point slides for this tutorial at

The tutorial is aimed at researchers in streams, data mining, and scientific computing. Its applications should interest anyone who works with scientists or financial "quants." The emphasis will be on recent results and open problems. This is a ripe area for further advance.

This tutorial is complementary to the tutorial of Professor Faloutsos. Each is self-contained. Here we cover tools for moving correlations, burst detection, query by humming, and databases for order. Professor Faloutsos's tutorial will emphasize indexing for similarity search, forecasting, and fractals.

An outline is available as Shasha_Tutorial04.pdf.


Dennis Shasha is a professor of computer science at the Courant Institute of NYU where he works with biologists on pattern discovery for microarrays, combinatorial design, and network inference;
and with physicists and financial people on algorithms for time series. Other areas of interest include database tuning, tree and graph matching, and cryptographic file systems. In his spare time, he has written three books of puzzles, a biography of great computer scientists, and technical books about database tuning, biological pattern recognition and an upcoming book time series. He also writes the puzzle column for Scientific American.

Tutorials 4 & 5, Thursday, June 17   ^

Indexing and Mining Streams

Christos Faloutsos (CMU)


How can we find patterns in a sequence of sensor measurements (eg., a sequence of temperatures, or water-pollutant measurements)? How can we compress it? What are the major tools for forecasting and outlier detection? The objective of this tutorial is to provide a concise and intuitive overview of the most important tools, that can help us find patterns in sensor sequences. Sensor data analysis becomes of increasingly high importance, thanks to the decreasing cost of hardware  and the increasing on-sensor processing abilities. We review the state of the art in three related fields: (a) fast similarity search for time sequences, (b) linear forecasting with the traditional AR (autoregressive) and ARIMA methodologies and (c) non-linear forecasting, for chaotic / self-similar time sequences, using lag-plots and fractals. The emphasis of the tutorial is to give the intuition behind these powerful tools, which is usually lost in the technical literature, as well as to give case studies that illustrate their practical use.

  • Similarity Search
    • why we need similarity search
    • distance functions (Euclidean, LP norms, time-warping)
    • fast searching (R-trees, M-trees)
    • feature extraction (DFT, Wavelets, SVD, FastMap)
  • Linear Forecasting
    • main idea behind linear forecasting
    • AR methodology
    • multivariate regression
    • Recursive Least Squares
    • de-trending; periodicities
  • Non-linear/chaotic forecasting
    • main idea: lag-plots
    • 'fractals' and 'fractal dimensions'
      • definition and intuition
      • algorithms for fast computation
    • case studies
Researchers that want to get up to speed with the major tools in time sequence analysis. Also, practitioners who want a concise, intuitive overview of the state of the art.

References can be found in faloutsos.pdf, and the tutorial on the Web at


Christos Faloutsos is a Professor at Carnegie Mellon University. He has received the Presidential Young Investigator Award by the National Science Foundation (1989), three "best paper" awards (SIGMOD 94, VLDB 97, KDD01-runner-up), and four teaching awards. He is a member of the executive committee of SIGKDD; he has published over 120 refereed articles, one monograph, and holds four patents. His research interests include data mining in streams and graphs, fractals, indexing methods for multimedia and text data bases, and data base performance.

Keynote, Thursday, June 17   ^

The roles of cryptography in database security

Ueli Maurer (ETH Zurich, Switzerland)


Database security relies on many different mechanisms, including access control, information flow control, operating system and network security, prevention of statistical inference, data and user authentication, encryption, time-stamping, digital signatures, and other cryptographic mechanisms. In this talk we take a systematic look at the current and future roles of cryptography in database security. In particular, we demonstrate some remarkable and apparently paradoxical new applications made possible by modern cryptographic protocols (although they are still quite far from efficient practical implementations.) One of them is private information retrieval, allowing a user to access data (e.g. an entry in a patent database) without the database learning which data was accessed. Another one is the secure combination of several databases owned by mutually mistrusting organizations, for example by competing companies.


Ueli Maurer ( is professor of computer science and head of the Information Security and Cryptography Research Group at the Swiss Federal Institute of Technology (ETH), Zurich. His research interests include information security, the theory and applications of cryptography, algorithms, discrete mathematics, and information theory.

He has served extensively as an editor and a member of program committees. Currently he is Editor-in-Chief of the Journal of Cryptology, Editor-in-Chief (with Ronald Rivest) of Springer Verlag's book series in Information Security and Cryptography, and serves on the Board of Directors of the International Association for Cryptologic Research (IACR). He is a Fellow of the IEEE, was the 2000 Rademacher Lecturer of the Department of Mathematics at the University of Pennsylvania, and is a frequent invited speaker at scientific conferences.

Maurer holds several patents for cryptographic systems and has served as a consultant for many companies and government organizations. He serves on a few management and scientific advisory boards and is a co-founder of the Zurich-based security software company Seclutions.

Tutorials 6 & 7, Thursday, June 17   ^

Tools for Design of Composite Web Services

Richard Hull (Bell Laboratories), Jianwen Su (University of California at Santa Barbara)


The web services paradigm promises to enable rich, flexible, and dynamic interoperation of highly distributed and heterogeneous web-hosted services. Substantial progress has already been made towards this goal (e.g., emerging standards such as SOAP, WSDL, BPEL) and industrial technology (e.g., IBM’s WebSphere Toolkit, Sun’s Open Net Environment and Jini™ Network technology, Microsoft’s .Net and Novell’s One Net initiatives, HP’s e-speak). Several research efforts are already underway that build on or take advantage of the paradigm, including the DAML-S/OWL-S program [8, 25, 19], the Semantic eWallets project [18], ActiveXML [3], and automata-based models for web services [6, 21, 4]. But there is still a long way to go, especially given the ostensible long-term goal of enabling the automated discovery, composition, enactment, and monitoring of collections of web services working to achieve a specified objective. A fundamental question right now concerns the design and analysis of composite web services. Specifically, are existing tools for design and analysis of software systems sufficient for web services, or are new techniques needed to handle the novel aspects of the web services paradigm? This raises a variety of questions, several of which are relevant for the database research community. These include: What is the right way to model web services and their compositions? What is the right way to query them in order to support automated composition and analysis algorithms? And how can the data management aspects of composite web services be incorporated into current web services standards? This tutorial will provide the groundwork needed to address these questions, by describing emerging frameworks for studying composite services, and identifying emerging tools and techniques for both automated design and analysis of composite web services. The tutorial will begin with an overview of the underlying goals and assumptions of the web services paradigm, from the perspectives of both emerging standards and the semantic web services community (Section 2). It then reviews key standards in the area, as these provide some of the basic building blocks to be used by the web services paradigm, and will influence the form that this paradigm eventually takes (Section 3).

The tutorial will identify fundamental aspects for modeling of web services and their compositions. Research and standards are exploring a variety of approaches, which can be broken along several dimensions. A key dimension concerns whether the focus is on message passing (as found in WSDL and BPEL) or actions performed (as found in OWL-S and elsewhere) (Section 4). Another key dimension concerns how behaviors should be modeled, including both the behavior of individual web services and also the desired or actual behaviors of compositions of web services. Possibilities here include flowchart-based approaches (e.g., from workflow, BPEL), automata-based models, or goal-driven approaches (e.g., OWL-S). The tutorial discusses several variations that arise from the underlying operational model (Section 5). This includes issues such as whether messages passed are synchronous or asynchronous, whether unbounded queues are permitted, and the topology for compositions, e.g., peer-to-peer or hub-and-spoke. The tutorial then examines approaches and technologies that are relevant to the problem of (manual or automated) composition of web services (Section 6). This includes examination of technologies that emerged before the web services paradigm came into being, such as automated synthesis as found in the verification community and automated workflow design from inter-task dependencies. It also includes in-depth discussion of recent research on composition for web services. Analysis of composite web services is then considered (Section 7). This includes again more classical results, e.g., from the workflow and verification communities, and several recent results based on various models of composite web services.

The final topic of the tutorial concerns research questions arising from the web services paradigm that are very interesting to the database community (Section 8). These focus on issues such as the development of abstractions to enable querying and manipulating web services and behavior models. The work on web services described above is based on a variety of established fields including, e.g., automata theory [20], process algebras [26], temporal logics [12], and situation calculi [30]; review of selected results from these fields will be sprinkled through the tutorial.

The tutorial described here is focused primarily on issues of design and analysis of composite web services, primarily from the perspectives of models, languages, and algorithms. Many topics will be addressed only in passing or not at all; these include transactional properties and ontology-based reasoners.

Short paper is available as here.


Richard Hull is Director of Network Data and Services Research at Bell Laboratories, the research arm of Lucent Technologies.  Hull has broad research interests in the areas of data and information management. He has published over 75 journal and conference articles, and is co-author of the book "Foundations of Databases" (Addison-Wesley), a graduate-level textbook on the theory of databases.  Hull's research has spanned the areas of database theory, query languages, database models, database programming languages, and workflow.  His current work is focused on e-services, personalization, ubiquitous computing and data sharing.

Hull received a Ph.D. degree in Mathematics from the University of California, Berkeley, in 1979.  He then joined the faculty of the Computer Science department at the University of Southern California, where he worked until 1993.  He also served on the faculties of Computer Science at the University of California, Los Angeles, and the University of Colorado, Boulder, and has been a frequent visitor to the VERSO group at INRIA in France.  During his academic career his research has been supported in part by grants from NSF, DARPA, and AT&T.  In 1996 Hull joined Bell Laboratories, where his work has focused on a combination of research and technology transfer into Lucent's product line.  As part of that work he has developed 6 US patents and developed core ideas that have lead to two Lucent products and international media attention, in addition to several research publications.  Hull has also chaired and served on numerous conference program committees, and is now Associate Editor of the ACM Transactions on Database Systems journal.

Jianwen Su is a Professor in the Department of Computer Science, University of California at Santa Barbara.  He received his Ph.D. degree in Computer Science from the University of Southern California in 1991.  He held a visiting position at Bell Laboratories in 1998 and is currently an adjunct professor at the Peking University.  He has published over 70 journal and conference papers in the areas of query languages, data models, scientific databases, workflow, spatial databases, formal verification, and recently web services. His current work on web services includes modeling and analysis of web service behaviors and compositions.  His research has been supported by grants from NSF and NASA.  He has served on program and organization committees of many database conferences and was the General Chair of the SIGMOD 2001 conference.

Welcome | SIGMOD CFP | PODS CFP | SIGMOD Program | PODS Program | Registration | Organization | Venue | Links

Web site by Gaëtan Gaumer - Original design by Michalis Petropoulos - Feedback