SIGMOD/PODS 2004 Conference


In cooperation with

ACM SIGMOD/PODS 2004 Conference

Maison de la Chimie, Paris, France.
June 13-18, 2004

SIGMOD Program

At a glance

Tuesday	09:00 10:30	10:30 11:00	11:00 12:30	12:30 14:00	14:00 16:00		16:00 16:30	16:30 18:30	20:00
Research Track	Opening and Keynote	Break	Stream Management	Lunch	Web, XML, and IR		Break	New Styles of XML
Research Track			XML Query Efficiency		Data Mining Applications			Statistics
Research & Industrial Track			Industrial: Query Processing		Non-Standard Query Processing			Indexing & Tuning
Tutorial & Panel			Tutorial: Security		Tutorial: Security (continued)			Panel
Demo			Group A: Web Services, Data Integration, Data Mining		Group B: Streams, Peer-to-peer and distributed databases			Group C: XML, Data Privacy, Potpourri
Wednesday	08:30 10:30	10:00 10:30	10:30 12:00	12:00 13:30	13:30 14:30	14:30 16:30	16:30 17:00	17:00 19:00	20:30
Research Track	Data Integration	Break	XML PubSub & Indexing	Lunch	SIGMOD Business Meeting and Awards Ceremony	Security & Privacy	Break	Query Optimization	Conference Banquet
Research Track	Stream Query Processing		P2P and Networks			Moving Objects		Spatial Data
Industrial Track	Database Run-Time System		DB Applications			Web Services		Information Assurance
Research Track & Tutorials	Clustering		Tutorial: Time Series			Tutorial: Streams		Tutorial: Streams (continued)
Demo	Group C: XML, Data Privacy, Potpourri		Group A: Web Services, Data Integration, Data Mining			Group B: Streams, Peer-to-peer and distributed databases
Thursday	08:45 10:00	10:00 10:30	10:30 11:30	11:30 12:00	12:00 13:00	14:00 - ...
Research Track	Keynote	Break	Schema Discovery	Break	Query Progress	Joint Workshops
Research Track			Query Uncertainty		Consistency & Availability
Research & Industrial Track			Text and DB		Industrial: XML & RDBMS Marriage
Tutorials			Tutorial: Web Services		Tutorial: Web Services (continued)

Tuesday, June 15 ^

09:00 - 10:30	Opening and Keynote	Chair: Gerhard Weikum
	A Baker's Dozen Revolutions in Database System Architecture Jim Gray

10:30 - 11:00	Break

11:00 - 12:30	Research Session 1: Stream Management	Chair: Edward Chang
	Adaptive Stream Resource Management Using Kalman Filters Ankur Jain, Edward Y. Chang, Yuan-Fang Wang Online Event-driven Subsequence Matching over Financial Data Streams Huanmei Wu, Betty Salzberg, Donghui Zhang Holistic UDAFs at streaming speeds Graham Cormode, Theodore Johnson, Flip Korn, S. Muthukrishnan, Oliver Spatscheck, Divesh Srivastava
11:00 - 12:30	Research Session 2: XML Query Efficiency	Chair: Sihem Amer-Yahia
	BLAS: An Efficient XPath Processing System Yi Chen, Susan B. Davidson, Yifeng Zheng Efficient Processing of Twig Queries with OR-Predicates Haifeng Jiang, Hongjun Lu, Wei Wang Tree Logical Classes for Efficient Evaluation of XQuery Stelios Paparizos, Yuqing Wu, Laks V. S. Lakshmanan, H. V. Jagadish
11:00 - 12:30	Industrial Session 1: Query Processing	Chair: Guy Lohman
	Query Sampling in DB2 Universal Database Jarek Gryz, Junjie Guo --- York University Linqi Liu, Calisto Zuzarte --- IBM Toronto Query Processing for SQL Updates Cesar A. Galindo-Legaria, Stefano Stefani, Florian Wass --- Microsoft Corporation, USA Parallel SQL Execution in Oracle 10g Thierry Cruanes, Benoit Dageville, Bhaskar Ghosh --- Oracle Corporation, USA
11:00 - 12:30	Tutorial Session 1: Security
	Security of Shared Data in Large Systems: State of the Art and Research Directions Arnon Rosenthal, Marianne Winslett

12:00 - 14:00	Lunch

14:00 - 16:00	Research Session 3: Web, XML, and IR	Chair: Catriel Beeri
	FleXPath: Flexible Structure and Full-Text Querying for XML Sihem Amer-Yahia, Laks V. S. Lakshmanan, Shashank Pandit An Interactive Clustering-based Approach to Integrating Source Query interfaces on the Deep Web Wensheng Wu, Clement Yu, AnHai Doan, Weiyi Meng Understanding Web Query Interfaces: Best-Effort Parsing with Hidden Syntax Zhen Zhang, Bin He, Kevin Chen-Chuan Chang Using the Structure of Web Sites for Automatic Segmentation of Tables Kristina Lerman, Lise Getoor, Steven Minton, Craig Knoblock
14:00 - 16:00	Research Session 4: Data Mining Applications	Chair: Cyrus Shahabi
	Identifying Similarities, Periodicities and Bursts for Online Search Queries Michail Vlachos, Chris Meek, Zografoula Vagena, Dimitrios Gunopulos FARMER: Finding Interesting Rule Groups in Microarray Datasets Gao Cong, Anthony K. H. Tung, Xin Xu, Feng Pan, Jiong Yang Diamond in the Rough: Finding Hierarchical Heavy Hitters in Multi-Dimensional Data Graham Cormode, Flip Korn, S. Muthukrishnan, Divesh Srivastava Cost-Based Labeling of Groups of Mass Spectra Lei Chen, Zheng Huang, Raghu Ramakrishnan
14:00 - 16:00	Research Session 5: Non-standard Query Processing	Chair: Paul Larson
	Optimization of Query Streams Using Semantic Prefetching Ivan J. Bowman, Kenneth Salem Buffering Database Operations for Enhanced Instruction Cache Performance Jingren Zhou, Ken Ross Rank-aware Query Optimization Ihab F. Ilyas, Rahul Shah, Walid G. Aref, Jeffrey Scott Vitter, Ahmed K. Elmagarmid Fast Computation of Database Operations using Graphics Processors Naga K. Govindaraju, Brandon Lloyd, Wei Wang, Ming Lin, Dinesh Manocha
14:00 - 16:00	Tutorial Session 2: Security (continued)
	Security of Shared Data in Large Systems: State of the Art and Research Directions Arnon Rosenthal, Marianne Winslett

16:00 - 16:30	Break

16:30 - 18:30	Research Session 6: New Styles of XML	Chair: Mary Fernandez
	Lazy Query Evaluation for Active XML Serge Abiteboul, Omar Benjelloun, Bogdan Cautis, Ioana Manolescu, Tova Milo, Nicoleta Preda Data Stream Management for Historical XML Data Sujoe Bose, Leonidas Fegaras Colorful XML: One Hierarchy Isn't Enough H. V. Jagadish, Laks V. S. Lakshmanan, Monica Scannapieco, Divesh Srivastava, Nuwee Wiwatwattana Approximate XML Query Answers Neoklis Polyzotis, Minos Garofalakis, Yannis Ioannidis
16:30 - 18:30	Research Session 7: Statistics	Chair: Volker Markl
	A Bi-Level Bernoulli Scheme for Database Sampling Peter J. Haas, Christian Koenig Effective Use of Block-Level Sampling in Statistics Estimation Surajit Chaudhuri, Gautam Das, Utkarsh Srivastava Online Maintenance of Very Large Random Samples Christopher Jermaine, Abhijit Pol, Subramanian Arumugam Conditional Selectivity for Statistics on Query Expressions Nicolas Bruno, Surajit Chaudhuri
16:30 - 18:30	Research Session 8: Indexing and Tuning	Chair: Patrick O'Neil
	Transaction support for indexed views Goetz Graefe, Michael Zwilling Graph Indexing: A Frequent Structure-based Approach Xifeng Yan, Philip S. Yu, Jiawei Han The Priority R-Tree: A Practically Efficient and Worst-Case Optimal R-Tree Lars Arge, Mark de Berg, Herman J. Haverkort, Ke Yi Integrating Vertical and Horizontal Partitioning Into Automated Physical Database Design Sanjay Agrawal, Vivek Narasayya, Beverly Yang
16:30 - 18:30	Panel: Rethinking the Conference Reviewing Process
	Michael Franklin (SIGMOD 02 PC chair), co-moderator Jennifer Widom (SIGMOD 05 PC chair), co-moderator Gerhard Weikum (SIGMOD 04 PC chair) Alon Halevy (SIGMOD 03 PC chair) Philip A. Bernstein (SIGMOD 79 PC chair: the long-term view) David DeWitt (SIGMOD 83 PC chair: the long-term view) Anastassia Ailamaki (the near-term view) Zack Ives (the near-term view)

Wednesday, June 16 ^

08:30 - 10:00	Research Session 9: Data Integration	Chair: Bertram Ludaescher
	Constraint-Based XML Query Rewriting For Data Integration Cong Yu, Lucian Popa iMAP: Discovering Complex Mappings between Database Schemas Robin Dhamankar, Yoonkyong Lee, AnHai Doan, Alon Halevy, Pedro Domingos Adapting to Source Properties in Processing Data Integration Queries Zachary G. Ives, Alon Halevy, Daniel S. Weld
08:30 - 10:00	Research Session 10: Stream Query Processing	Chair: Ioana Manolescu
	Adaptive Ordering of Pipelined Stream Filters Shivnath Babu, Rajeev Motwani, Kamesh Munagala, Itaru Nishizawa, Jennifer Widom Static Optimization of Conjunctive Queries with Sliding Windows Over Infinite Streams Ahmed M. Ayad, Jeffrey F. Naughton Dynamic Plan Migration for Continuous Queries Over Data Streams Yali Zhu, Elke A. Rundensteiner, George T. Heineman
08:30 - 10:00	Industrial Session 2: Database Run-Time System	Chair: Anil Nori
	Data Densification in a Relational Database System Abhinav Gupta, Sankar Subramanian, Srikanth Bellamkonda, Tolga Bozkaya, Nathan Folkert, Lei Sheng, Andrew Witkowski --- Oracle Corporation, USA Hosting the .NET Runtime in Microsoft SQL Server Alazel Acheson, Mason Bendixen, José A. Blakeley, Peter Carlin, Ebru Ersan, Jun Fang, Xiaowei Jiang, Christian Kleinerman, Balaji Rathakrishnan, Gideon Schaller, Beysim Sezgin, Ramachandran Venkatesh, Honggang Zhang --- Microsoft Corp, USA Vertical and Horizontal Percentage Aggregations Carlos Ordonez --- Teradata, NCR, USA
08:30 - 10:00	Research Session 11: Clustering	Chair: Marie-Christine Rousset
	Clustering Objects on a Spatial Network Man Lung Yiu, Nikos Mamoulis Computing Clusters of Correlation Connected Objects Christian Boehm, Karin Kailing, Peer Kroeger, Arthur Zimek Incremental and Effective Data Summarization for Dynamic Hierarchical Clustering Samer Nassar, Joerg Sander, Corrine Cheng

10:00 - 10:30	Break

10:30 - 12:00	Research Session 12: XML PubSub and Indexing	Chair: Peter Apers
	Implementing a Scalable XML Publish/Subscribe System using Relational Database Systems Feng Tian, Berthold Reinwald, Hamid Pirahesh, Tobias Mayr, Jussi Myllymaki Incremental Maintenance of XML Structural Indexes Ke Yi, Hao He, Ioana Stanoi, Jun Yang Incremental Evaluation of Schema-Directed XML Publishing Philip Bohannon, Peter Buneman, Byron Choi, Wenfei Fan
10:30 - 12:00	Research Session 13: Peer-to-Peer and Sensor Networks	Chair: Laurent Amsaleg
	The Price of Validity in Dynamic Networks Mayank Bawa, Aristides Gionis, Hector Garcia-Molina, Rajeev Motwani Compressing Historical Information in Sensor Networks Antonios Deligiannakis, Yannis Kotidis, Nick Roussopoulos Efficient Query Reformulation in Peer-Data Management Systems Igor Tatarinov, Alon Halevy
10:30 - 12:00	Industrial Session 3: Web Services	Chair: Roger Barga
	Models for Web Services Transactions Mark Little --- Chief Architect Arjun Technologies, UK Enabling Sovereign Information Sharing Using Web Services Rakesh Agrawal, Dmitri Asonov, Ramakrishnan Srikant --- IBM Almaden Building Dynamic Application Networks with Web Services Matt Mihic --- BEA Secure, Reliable, Transacted: Innovation in Web Services Martin Gudgin --- Microsoft Corporation
10:30 - 12:00	Tutorial Session 3: Time Series
	Fast Algorithms for Time Series: algorithms and applications to Finance, Physics, Music, Biology and other Suspects Alberto Lerner, Dennis Shasha, Zhihua Wang, Xiaojian Zhao, Yunyue Zhu

12:00 - 13:30	Lunch

13:30 - 14:30	SIGMOD Business Meeting and Awards Ceremony

14:30 - 16:30	Research Session 14: Security and privacy	Chair: Arnon Rosenthal
	Extending Query Rewriting Techniques for Fine-Grained Access Control Shariq Rizvi, Alberto Mendelzon, S. Sudarshan, Prasan Roy Order-Preserving Encryption for Numeric Data Rakesh Agrawal, Jerry Kiernan, Ramakrishnan Srikant, Yirong Xu A Formal Analysis of Information Disclosure in Data Exchange Gerome Miklau, Dan Suciu Secure XML Querying with Security Views Wenfei Fan, Chee-Yong Chan, Minos Garofalakis
14:30 - 16:30	Research Session 15: Moving Objects	Chair: Jan Chomicki
	Indexing Spatio-Temporal Trajectories with Chebyshev Polynomials Raymond Ng, Yuhan Cai Prediction and Indexing of Moving Objects with Unknown Motion Patterns Yufei Tao, Christos Faloutsos, Dimitris Papadias, Bin Liu SINA: Scalable Incremental Processing of Continuous Queries in Spatio-temporal Databases Mohamed F. Mokbel, Xiaopeng Xiong, Walid Aref STRIPES: An Efficient Index for Predicted Trajectories Jignesh M. Patel, Yun Chen, V. Prasad Chakka
14:30 - 16:30	Industrial Session 4: Database Applications	Chair: Gustavo Alonso
	SoundCompass: A Practical Query-by-Humming System Naoko Kosugi, Yasushi Sakurai, Masashi Morimoto --- NTT Cyber Space Labs, Yokosuka, Kanagawa, Japan Model driven Business UI based on Maps Per Bendsen --- Microsoft Corp, Copenhagen dbSwitch - Towards a Database Utility Shaul Dar --- Savantis Systems Ltd, Herzelia, Israel Gil Hecht, Eden Shochat --- Savantis Systems Inc, Lexington, MA USA
14:30 - 16:30	Tutorial Session 4: Streams
	Indexing and Mining Streams Christos Faloutsos

16:30 - 17:00	Break

17:00 - 19:00	Research Session 16: Query Optimization	Chair: Zachary Ives
	CORDS: Automatic Discovery of Correlations and Soft Functional Dependencies Ihab F. Ilyas, Volker Markl, Peter Haas, Paul Brown, Ashraf Aboulnaga Robust Query Processing through Progressive Optimization Volker Markl, Vijayshankar Raman, David Simmen, Guy Lohman, Hamid Pirahesh, Miso Cilimdzic Canonical Abstraction for Outerjoin Optimization Jun Rao, Hamid Pirahesh, Calisto Zuzarte
17:00 - 19:00	Research Session 17: Spatial Data	Chair:Jan Paredaens
	Joining Interval Data in Relational Databases Jost Enderle, Matthias Hampel, Thomas Seidl Approximation Techniques for Spatial Data Abhinandan Das, Johannes Gehrke, Mirek Riedewald Spatially-decaying Aggregation over a Network: Model and Algorithms Edith Cohen, Haim Kaplan
17:00 - 19:00	Industrial Session 5: Information Assurance Challenges	Chair: Pam Drew
	Requirements and Policy Challenges in Highly Secure Environments Dean E. Hall --- Information Assurance Section Chief, Federal Bureau of Investigation Information Assurance Technology Challenges Nicholas J. Multari --- Senior Manager, Information Assurance, Boeing Phantom Works Service-Oriented BI: Towards tight integration of business intelligence into operational applications Marcus Dill, Achim Kraiss, Stefan Sigg, Thomas Zurek --- SAP
17:00 - 19:00	Tutorial Session 5: Streams (continued)
	Indexing and Mining Streams Christos Faloutsos

20:30	Banquet

Thursday, June 17 ^

08:45 - 10:00	Keynote	Chair: Gerhard Weikum
	The roles of cryptography in database security Ueli Maurer

16:30 - 17:00	Break

10:30 - 11:30	Research Session 18: Schema Discovery	Chair: Sonia Bergamaschi
	TOSS: An Extension of TAX with Ontologies and Similarity Queries Edward Hung, Yu Deng, V.S. Subrahmanian Information-Theoretic Tools for Mining Database Structure from Large Data Sets Periklis Andritsos, Renee Miller, Panayiotis Tsaparas
10:30 - 11:30	Research Session 19: Query Uncertainty	Chair: Mirek Riedewald
	Efficient set joins on similarity predicates Sunita Sarawagi , Alok Kirpal Automatic Categorization of Query Results Kaushik Chakrabarti, Surajit Chaudhuri, Seung-won Hwang
10:30 - 11:30	Research Session 20: Text and DB	Chair: Laks Lakshmanan
	When one Sample is not Enough: Improving Text Database Selection Using Shrinkage Panagiotis G. Ipeirotis, Luis Gravano On the Integration of Structure Indexes and Inverted Lists Raghav Kaushik, Rajasekar Krishnamurthy, Jeffrey F. Naughton, Raghu Ramakrishnan
10:30 - 11:30	Tutorial Session 6: Web Services
	Tools for Design of Composite Web Services Richard Hull, Jianwen Su

11:30 - 12:00	Break

12:00 - 13:00	Research Session 21: Query Progress	Chair: Peter Boncz
	Toward a Progress Indicator for Database Queries Gang Luo, Jeffrey F. Naughton, Curt J. Ellmann, Michael W. Watzke Estimating Progress of Long Running SQL Queries Surajit Chaudhuri, Vivek Narasayya, Ravishankar Ramamurthy
12:00 - 13:00	Research Session 22: Consistency and Availability	Chair: Heiko Schuldt
	Relaxed Currency and Consistency: How to Say "Good Enough" in SQL Hongfei Guo, Per-AkeLarson, Raghu Ramakrishnan, Jonathan Goldstein Highly-Available, Fault-Tolerant, Parallel Dataflows Mehul A. Shah, Joseph M. Hellerstein, Eric Brewer
12:00 - 13:00	Industrial Session 6: The Marriage of XML and Relational Databases	Chair: Dan Florescu
	XML in the Middle: XQuery in the WebLogic Platform Michael J. Carey --- BEA ORDPATHs: Insert-Friendly XML Node Labels Patrick O'Neil, Elizabeth O'Neil --- University of Massachusetts Boston Shankar Pal, Istvan Cseri, Gideon Schaller, Nigel Westbury --- Microsoft
12:00 - 13:00	Tutorial Session 7: Web Services (continued)
	Tools for Design of Composite Web Services Richard Hull, Jianwen Su

Keynote, Tuesday, June 15 ^

A Baker's Dozen Revolutions in Database System Architecture

Jim Gray (Microsoft Research, USA)

Abstract:

Database system architectures are undergoing revolutionary changes.

At the core, we are finally unifying algorithms and data by integrating programming languages (a common language runtime or virtual machine) with the database system. This gives a full-blown object-relational system where relational operators manipulate sets of objects in non-procedural ways.
Coupled with this, each DBMS is now a web service: it is listening to port 80, publishing WSDL, and servicing SOAP calls. This has huge implications for how we structure applications. DBMSs are now object containers.
The first service one exports is a queue mechanism that allows asynchronous requests and is the basis for transaction processing and workflow applications. Future workflow systems are likely to be built on this core.
Data cubes and online analytic processing have been baked into most systems so that they seamlessly combine ROLAP and OLAP.
Beyond that, the systems have built a framework for data mining and machine learning algorithms. Decision trees, Bayes nets, and clustering are built in, and new algorithms can be added to this framework by third parties.
Traditionally database systems were row stores. But many applications have very sparse tables. So there is a rebirth of column stores that offer huge performance advantages for some applications.
To complete the puzzle, text, temporal, and spatial data access methods, along with their probabilistic reasoning have been added to the DBMSs. This makes them much more useful forbroad application classes.
External data is increasingly arriving as streams that need to be compared to a historical data - and stream-processing operators are being added to the xecution engines.
Publish-subscribe systems invert the data-query ratios, in those systems the data is compared against millions of queries rather than queries searching millions of records.
Meanwhile, disks and main memory are growing much faster than the bandwidth and latency between them, so the database systems are increasingly using multi-terabyte main memories and bulk sequential access to disk.
All these changes mandate that we rethink our query optimization strategy to be much more dynamic -- adapting to current conditions and selectivities rather than picking a static plan in advance.
Intelligence is moving to the periphery of the network. Each disk and each sensor will have a fairly competent database machine. There is considerable evidence that relational algebra is the most convenient way to program these systems.
And, database systems are now expected to be self-managing, self-healing, and always-up. As you can see, we have our work cut out for us in delivering all these features, but database systems are central to new applications.

Biography:

Jim Gray is part of Microsoft's research group. His work focuses on databases and transaction processing. Jim is active in the research community, is an ACM, NAE, NAS, and AAAS Fellow, and received the ACM Turing Award for his work on transaction processing. He edits of a series of books on data management, and has been active in building online databases like http://terraService.Net and http://skyserver.sdss.org.

Tutorial 1 & 2, Tuesday, June 15 ^

Security of Shared Data in Large Systems: State of the Art and Research Directions

Arnon Rosenthal (MITRE), Marianne Winslett (University of Illinois)

Abstract:

The target audience for this tutorial is the entire SIGMOD research community. The goals of the tutorial are to enlighten the SIGMOD research community about the state of the art in data security, especially for enterprise or larger systems, and to engage the community’s interest in improving the state of the art.

Detailed introduction: RosenthalWinslett-tutorial-abstract.pdf.

Biographies:

Arnon Rosenthal is a Principal Scientist at MITRE. He has broad interests in problems that arise when data is shared between communities, including a long-standing interest in the security issues that arise in data warehouses, federated databases, and enterprise information systems. He has also had a first-hand look at many security problems that arise in large government
and military organizations.

Marianne Winslett has been a professor at the University of Illinois since 1987. She started working on database security issues in the early 1990s, focusing on semantic issues in MLS databases. Her interests soon shifted to issues of trust management for data on the web. Trust negotiation is her main current research focus.

Panel, Tuesday, June 15 ^

Rethinking the Conference Reviewing Process

Moderators: Michael J. Franklin (UC Berkeley) and Jennifer Widom (Stanford)

Panelists: Anastassia Ailamaki (Carnegie Mellon), Philip A. Bernstein (Microsoft), David DeWitt (Wisconsin), Alon Halevy (Washington), Zachary Ives (U Penn), and Gerhard Weikum (Max Planck Institute)

Statement is provided as panelcamera.pdf.

Tutorial 3, Wednesday, June 16 ^

Fast Algorithms for Time Series with applications to Finance, Physics, Music, Biology, and other Suspects
(Work supported in part by U.S. NSF grants IIS-9988636 and N2010-0115586)

Alberto Lerner (IBM TJ Watson Research Center), Dennis Shasha, Zhihua Wang, Xiaojian Zhao, Yunyue Zhu (New York University)

Abstract:

Financial time series streams are watched closely by millions of traders. What exactly do they look for and how can we help them do it faster? Physicists study the time series emerging from their sensors. The same question holds for them. Musicians produce time series. Consumers may want to compare them. This tutorial presents techniques and case studies for four problems:

Finding sliding window correlations in financial, physical, and other applications.
Discovering bursts in large sensor data of gamma rays.
Matching hums to recorded music, even when people don't hum well.
Maintaining and manipulating time-ordered data in a database setting.

This tutorial draws mostly from the book High Performance Discovery in Time Series: techniques and case studies, Springer-Verlag 2004. You can find the power point slides for this tutorial at http://cs.nyu.edu/cs/faculty/shasha/papers/sigmod04.ppt.

The tutorial is aimed at researchers in streams, data mining, and scientific computing. Its applications should interest anyone who works with scientists or financial "quants." The emphasis will be on recent results and open problems. This is a ripe area for further advance.

This tutorial is complementary to the tutorial of Professor Faloutsos. Each is self-contained. Here we cover tools for moving correlations, burst detection, query by humming, and databases for order. Professor Faloutsos's tutorial will emphasize indexing for similarity search, forecasting, and fractals.

An outline is available as Shasha_Tutorial04.pdf.

Biography:

Dennis Shasha is a professor of computer science at the Courant Institute of NYU where he works with biologists on pattern discovery for microarrays, combinatorial design, and network inference;
and with physicists and financial people on algorithms for time series. Other areas of interest include database tuning, tree and graph matching, and cryptographic file systems. In his spare time, he has written three books of puzzles, a biography of great computer scientists, and technical books about database tuning, biological pattern recognition and an upcoming book time series. He also writes the puzzle column for Scientific American.

Tutorials 4 & 5, Thursday, June 17 ^

Indexing and Mining Streams

Christos Faloutsos (CMU)

Abstract:

DESCRIPTION - OBJECTIVES
How can we find patterns in a sequence of sensor measurements (eg., a sequence of temperatures, or water-pollutant measurements)? How can we compress it? What are the major tools for forecasting and outlier detection? The objective of this tutorial is to provide a concise and intuitive overview of the most important tools, that can help us find patterns in sensor sequences. Sensor data analysis becomes of increasingly high importance, thanks to the decreasing cost of hardware and the increasing on-sensor processing abilities. We review the state of the art in three related fields: (a) fast similarity search for time sequences, (b) linear forecasting with the traditional AR (autoregressive) and ARIMA methodologies and (c) non-linear forecasting, for chaotic / self-similar time sequences, using lag-plots and fractals. The emphasis of the tutorial is to give the intuition behind these powerful tools, which is usually lost in the technical literature, as well as to give case studies that illustrate their practical use.

CONTENT AND OUTLINE

Similarity Search

why we need similarity search
distance functions (Euclidean, LP norms, time-warping)
fast searching (R-trees, M-trees)
feature extraction (DFT, Wavelets, SVD, FastMap)

Linear Forecasting

main idea behind linear forecasting
AR methodology
multivariate regression
Recursive Least Squares
de-trending; periodicities

Non-linear/chaotic forecasting

main idea: lag-plots
'fractals' and 'fractal dimensions'

definition and intuition
algorithms for fast computation

case studies

WHO SHOULD ATTEND
Researchers that want to get up to speed with the major tools in time sequence analysis. Also, practitioners who want a concise, intuitive overview of the state of the art.

References can be found in faloutsos.pdf, and the tutorial on the Web at http://www-2.cs.cmu.edu/~christos/TALKS/SIGMOD04-tut/.

Biography:

Christos Faloutsos is a Professor at Carnegie Mellon University. He has received the Presidential Young Investigator Award by the National Science Foundation (1989), three "best paper" awards (SIGMOD 94, VLDB 97, KDD01-runner-up), and four teaching awards. He is a member of the executive committee of SIGKDD; he has published over 120 refereed articles, one monograph, and holds four patents. His research interests include data mining in streams and graphs, fractals, indexing methods for multimedia and text data bases, and data base performance.

Keynote, Thursday, June 17 ^

The roles of cryptography in database security

Ueli Maurer (ETH Zurich, Switzerland)

Abstract:

Database security relies on many different mechanisms, including access control, information flow control, operating system and network security, prevention of statistical inference, data and user authentication, encryption, time-stamping, digital signatures, and other cryptographic mechanisms. In this talk we take a systematic look at the current and future roles of cryptography in database security. In particular, we demonstrate some remarkable and apparently paradoxical new applications made possible by modern cryptographic protocols (although they are still quite far from efficient practical implementations.) One of them is private information retrieval, allowing a user to access data (e.g. an entry in a patent database) without the database learning which data was accessed. Another one is the secure combination of several databases owned by mutually mistrusting organizations, for example by competing companies.

Biography:

Ueli Maurer (http://www.crypto.ethz.ch/~maurer/) is professor of computer science and head of the Information Security and Cryptography Research Group at the Swiss Federal Institute of Technology (ETH), Zurich. His research interests include information security, the theory and applications of cryptography, algorithms, discrete mathematics, and information theory.

He has served extensively as an editor and a member of program committees. Currently he is Editor-in-Chief of the Journal of Cryptology, Editor-in-Chief (with Ronald Rivest) of Springer Verlag's book series in Information Security and Cryptography, and serves on the Board of Directors of the International Association for Cryptologic Research (IACR). He is a Fellow of the IEEE, was the 2000 Rademacher Lecturer of the Department of Mathematics at the University of Pennsylvania, and is a frequent invited speaker at scientific conferences.

Maurer holds several patents for cryptographic systems and has served as a consultant for many companies and government organizations. He serves on a few management and scientific advisory boards and is a co-founder of the Zurich-based security software company Seclutions.

Tutorials 6 & 7, Thursday, June 17 ^

Tools for Design of Composite Web Services

Richard Hull (Bell Laboratories), Jianwen Su (University of California at Santa Barbara)

Abstract:

The web services paradigm promises to enable rich, flexible, and dynamic interoperation of highly distributed and heterogeneous web-hosted services. Substantial progress has already been made towards this goal (e.g., emerging standards such as SOAP, WSDL, BPEL) and industrial technology (e.g., IBM’s WebSphere Toolkit, Sun’s Open Net Environment and Jini™ Network technology, Microsoft’s .Net and Novell’s One Net initiatives, HP’s e-speak). Several research efforts are already underway that build on or take advantage of the paradigm, including the DAML-S/OWL-S program [8, 25, 19], the Semantic eWallets project [18], ActiveXML [3], and automata-based models for web services [6, 21, 4]. But there is still a long way to go, especially given the ostensible long-term goal of enabling the automated discovery, composition, enactment, and monitoring of collections of web services working to achieve a specified objective. A fundamental question right now concerns the design and analysis of composite web services. Specifically, are existing tools for design and analysis of software systems sufficient for web services, or are new techniques needed to handle the novel aspects of the web services paradigm? This raises a variety of questions, several of which are relevant for the database research community. These include: What is the right way to model web services and their compositions? What is the right way to query them in order to support automated composition and analysis algorithms? And how can the data management aspects of composite web services be incorporated into current web services standards? This tutorial will provide the groundwork needed to address these questions, by describing emerging frameworks for studying composite services, and identifying emerging tools and techniques for both automated design and analysis of composite web services. The tutorial will begin with an overview of the underlying goals and assumptions of the web services paradigm, from the perspectives of both emerging standards and the semantic web services community (Section 2). It then reviews key standards in the area, as these provide some of the basic building blocks to be used by the web services paradigm, and will influence the form that this paradigm eventually takes (Section 3).

The tutorial will identify fundamental aspects for modeling of web services and their compositions. Research and standards are exploring a variety of approaches, which can be broken along several dimensions. A key dimension concerns whether the focus is on message passing (as found in WSDL and BPEL) or actions performed (as found in OWL-S and elsewhere) (Section 4). Another key dimension concerns how behaviors should be modeled, including both the behavior of individual web services and also the desired or actual behaviors of compositions of web services. Possibilities here include flowchart-based approaches (e.g., from workflow, BPEL), automata-based models, or goal-driven approaches (e.g., OWL-S). The tutorial discusses several variations that arise from the underlying operational model (Section 5). This includes issues such as whether messages passed are synchronous or asynchronous, whether unbounded queues are permitted, and the topology for compositions, e.g., peer-to-peer or hub-and-spoke. The tutorial then examines approaches and technologies that are relevant to the problem of (manual or automated) composition of web services (Section 6). This includes examination of technologies that emerged before the web services paradigm came into being, such as automated synthesis as found in the verification community and automated workflow design from inter-task dependencies. It also includes in-depth discussion of recent research on composition for web services. Analysis of composite web services is then considered (Section 7). This includes again more classical results, e.g., from the workflow and verification communities, and several recent results based on various models of composite web services.

The final topic of the tutorial concerns research questions arising from the web services paradigm that are very interesting to the database community (Section 8). These focus on issues such as the development of abstractions to enable querying and manipulating web services and behavior models. The work on web services described above is based on a variety of established fields including, e.g., automata theory [20], process algebras [26], temporal logics [12], and situation calculi [30]; review of selected results from these fields will be sprinkled through the tutorial.

The tutorial described here is focused primarily on issues of design and analysis of composite web services, primarily from the perspectives of models, languages, and algorithms. Many topics will be addressed only in passing or not at all; these include transactional properties and ontology-based reasoners.

Short paper is available as here.

Biographies:

Richard Hull is Director of Network Data and Services Research at Bell Laboratories, the research arm of Lucent Technologies. Hull has broad research interests in the areas of data and information management. He has published over 75 journal and conference articles, and is co-author of the book "Foundations of Databases" (Addison-Wesley), a graduate-level textbook on the theory of databases. Hull's research has spanned the areas of database theory, query languages, database models, database programming languages, and workflow. His current work is focused on e-services, personalization, ubiquitous computing and data sharing.

Hull received a Ph.D. degree in Mathematics from the University of California, Berkeley, in 1979. He then joined the faculty of the Computer Science department at the University of Southern California, where he worked until 1993. He also served on the faculties of Computer Science at the University of California, Los Angeles, and the University of Colorado, Boulder, and has been a frequent visitor to the VERSO group at INRIA in France. During his academic career his research has been supported in part by grants from NSF, DARPA, and AT&T. In 1996 Hull joined Bell Laboratories, where his work has focused on a combination of research and technology transfer into Lucent's product line. As part of that work he has developed 6 US patents and developed core ideas that have lead to two Lucent products and international media attention, in addition to several research publications. Hull has also chaired and served on numerous conference program committees, and is now Associate Editor of the ACM Transactions on Database Systems journal.

Jianwen Su is a Professor in the Department of Computer Science, University of California at Santa Barbara. He received his Ph.D. degree in Computer Science from the University of Southern California in 1991. He held a visiting position at Bell Laboratories in 1998 and is currently an adjunct professor at the Peking University. He has published over 70 journal and conference papers in the areas of query languages, data models, scientific databases, workflow, spatial databases, formal verification, and recently web services. His current work on web services includes modeling and analysis of web service behaviors and compositions. His research has been supported by grants from NSF and NASA. He has served on program and organization committees of many database conferences and was the General Chair of the SIGMOD 2001 conference.

Web site by Gaëtan Gaumer - Original design by Michalis Petropoulos - Feedback

ACM SIGMOD/PODS 2004 Conference

Maison de la Chimie, Paris, France. June 13-18, 2004

SIGMOD Program

Maison de la Chimie, Paris, France.
June 13-18, 2004