NoDB: efficient query execution on raw data files

Ioannis Alagiannis, Renata Borovica-Gajic, Miguel Branco, Stratos Idreos, Anastasia Ailamaki


NoDB introduces a radical departure from the standard paradigm of ingest-then-process data analytics: a new paradigm that does not require data loading while still maintaining the whole feature set of a modern database system. This is accomplished by making raw data files a first-class citizen, fully integrated with the query engine. NoDB introduces the concept of positional maps to efficiently execute queries on raw, never-before-seen data collections along with ingested and processed datasets. The result is a structure later known as “virtual data lake”: a cache which records frequently-used data and operations and accelerates execution through learning from previous queries. NoDB’s data virtualization and just-in-time query processing algorithms are implemented at the heart of several data lake platforms.

Ioannis Alagiannis is a Software Engineer at Oracle Labs. He received his Ph.D. in Computer Science from the École Polytechnique Fédérale de Lausanne (EPFL) in Switzerland and his MSc in Computer Systems Technology from the University of Athens in Greece. His interests include big data analytics, large scale distributed systems, adaptive query processing and analyzing performance of computer systems. Prior to Oracle Labs, he worked at Microsoft on Azure Synapse Analytics, a high performance and scalable distributed database system and at Swisscom on Mobility Insights, a mobile network data analysis platform.

Renata Borovica-Gajic holds a position of Senior Lecturer in Data Analytics in the School of Computing and Information Systems at The University of Melbourne. Dr Borovica-Gajic received her Ph.D. degree in Computer Science from Swiss Federal Institute of Technology in Lausanne (EPFL), Switzerland in 2016. Renata’s research focuses on solving data management problems when storing, accessing and processing massive data sets, enabling faster, more predictable, and cheaper data analysis as a result. She envisions database systems as dynamic entities able to adjust query processing strategies to fit the characteristics of data and usage patterns. She is also interested in the topics of scientific data management, data exploration, query optimization, physical database design, and hardware-software co-design. Her work has repeatedly appeared in the premier data management conferences, including SIGMOD, VLDB, and ICDE, and she has held numerous organization roles in those flagship conferences.

Miguel Branco is the CEO and co-founder of RAW Labs, where he is helping to build the next-generation query engine for modern-day datasets. His work focuses on large-scale data management for complex business and scientific datasets. Prior to founding RAW Labs, Miguel was a database researcher at EPFL working on scientific data management. Before that, Miguel was staff member at CERN, responsible for building the distributed data management system of the ATLAS Experiment of the Large Hadron Collider.

Stratos Idreos is an associate professor of Computer Science at the Harvard John A. Paulson School of Engineering and Applied Sciences. He leads DASlab, the Data Systems Laboratory at Harvard SEAS. His research focuses on building a grammar for data systems with the goal of making it dramatically easier or even automating in some cases the design of workload and hardware conscious data systems for diverse applications including relational, NoSQL, machine learning, and Blockchain. For his doctoral work on Database Cracking, Stratos was awarded the 2011 ACM SIGMOD Jim Gray Doctoral Dissertation award. In 2015 he was awarded the IEEE TCDE Rising Star Award and in 2020 he received the ACM SIGMOD Contributions award. Stratos was PC Chair of ACM SIGMOD 2021 and IEEE ICDE 2022.

Anastasia Ailamaki is a Professor of Computer and Communication Sciences at the École Polytechnique Fédérale de Lausanne (EPFL) in Switzerland, as well as the co-founder and Chair of the Board of Directors of RAW Labs SA, a Swiss company developing systems to analyze heterogeneous big data from multiple sources efficiently. She earned a Ph.D. in Computer Science from the University of Wisconsin-Madison in 2000. She has received the 2019 ACM SIGMOD Edgar F. Codd Innovations Award and the 2020 VLDB Women in Database Research Award. She is also the recipient of an ERC Consolidator Award (2013), the Finmeccanica endowed chair from the Computer Science Department at Carnegie Mellon (2007), a European Young Investigator Award from the European Science Foundation (2007), an Alfred P. Sloan Research Fellowship (2005), an NSF CAREER award (2002), twelve best-paper awards in international scientific conferences. She has received the 2018 Nemitsas Prize in Computer Science by the President of Cyprus and the 2021 ARGO Innovation Award by the President of the Hellenic Republic. She is an ACM fellow, an IEEE fellow, a member of the Academia Europaea, and an elected member of the Swiss, the Belgian, the Greek, and the Cypriot National Research Councils.