Fault-tolerance in the Borealis distributed stream processing system

Magdalena Balazinska, Hari Balakrishnan, Samuel Madden, Michael Stonebraker

Original Abstract: We present a replication-based approach to fault-tolerant distributed stream processing in the face of node failures, network failures, and network partitions. Our approach aims to reduce the degree of inconsistency in the system while guaranteeing that available inputs capable of being processed are processed within a specified time threshold. This threshold allows a user to trade availability for consistency: a larger time threshold decreases availability but limits inconsistency, while a smaller threshold increases availability but produces more inconsistent results based on partial data. In addition, when failures heal, our scheme corrects previously produced results, ensuring eventual consistency.Our scheme uses a data-serializing operator to ensure that all replicas process data in the same order, and thus remain consistent in the absence of failures. To regain consistency after a failure heals, we experimentally compare approaches based on checkpoint/redo and undo/redo techniques and illustrate the performance trade-offs between these schemes.

Magdalena Balazinska is the Jean Loup Baer Associate Professor of Computer Science and Engineering at the University of Washington. She’s the director of the IGERT PhD Program in Big Data and Data Science and the director of the associated Advanced Data Science PhD Option. She’s also a Senior Data Science Fellow of the University of Washington eScience Institute. Magdalena’s research interests are in the field of database management systems. Her current research focuses on data management for data science, big data systems, and cloud computing. Magdalena holds a Ph.D. from the Massachusetts Institute of Technology (2006). She is a Microsoft Research New Faculty Fellow (2007), received the inaugural VLDB Women in Database Research Award (2016), an NSF CAREER Award (2009), a 10-year most influential paper award (2010), a Google Research Award (2011), an HP Labs Research Innovation Award (2009 and 2010), a Rogel Faculty Support Award (2006), a Microsoft Research Graduate Fellowship (2003-2005), and multiple best-paper awards.
Hari Balakrishnan is the Fujitsu Professor of Computer Science at MIT. His research is in networked computer systems, with current interests in networking, data management, and sensing for a world of mobile and embedded devices connected to cloud services running in large datacenters. He has made many contributions to mobile and sensor computing, overlay and peer-to-peer networks, wireless networks, Internet architecture, and data management systems. He is a member of the National Academy of Engineering (2015) and the American Academy of Arts and Science (2017), as well as an ACM Fellow (2008), a Sloan Fellow (2002), and an ACM dissertation award winner (1998). He has received several best-paper awards including the IEEE Bennett prize (2004), and test-of-time / hall-of-fame awards from SIGCOMM (2011), SIGMOBILE (2017), SIGOPS (2015), and SIGMOD (2017). In 2010, Balakrishnan co-founded Cambridge Mobile Telematics, a company that develops mobile sensing, inferencing, and data analytics to change driver behavior and make roads safer around the world. He was an advisor to Meraki from its inception in 2006 to its acquisition by Cisco in 2012. In 2003, Balakrishnan co-founded StreamBase Systems (acquired by TIBCO), the first high-performance commercial stream processing (aka complex event processing) engine.

Balakrishnan received his PhD in 1998 from UC Berkeley and a BTech in 1993 from IIT Madras, which named him a distinguished alumnus in 2013.

Samuel Madden is a Professor of Electrical Engineering and Computer Science in MIT’s Computer Science and Artificial Intelligence Laboratory. His research interests include databases, distributed computing, and networking. Research projects include the C-Store column-oriented database system, the CarTel mobile sensor network system, and the Relational Cloud “database-as-a-service”. Madden is a leader in the emerging field of “Big Data”, heading the Intel Science and Technology Center (ISTC) for Big Data, a multi-university collaboration on developing new tools for processing massive quantities of data. He also leads BigData@CSAIL, an industry-backed initiative to unite researchers at MIT and leaders from industry to investigate the issues related to systems and algorithms for data that is high rate, massive, or very complex.Madden received his Ph.D. from the University of California at Berkeley in 2003 where he worked on the TinyDB system for data collection from sensor networks. Madden was named one of Technology Review’s Top 35 Under 35 in 2005, and is the recipient of several awards, including an NSF CAREER Award in 2004, a Sloan Foundation Fellowship in 2007, best paper awards in VLDB 2004 and 2007, and a best paper award in MobiCom 2006. He also received a a “test of time” award in SIGMOD 2013 (for his work on Acquisitional Query Processing in SIGMOD 2003), and a ten year best paper award in VLDB 2015 (for his work on the C-Store system).
Michael Stonebraker has been a pioneer of data base research and technology for more than a quarter of a century. He was the main architect of the INGRES relational DBMS, and the object-relational DBMS, POSTGRES. These prototypes were developed at the University of California at Berkeley where Stonebraker was a Professor of Computer Science for twenty five years. More recently at M.I.T. he was a co-architect of the Aurora/Borealis stream processing engine, the C-Store column-oriented DBMS, the H-Store transaction processing engine, the SciDB array DBMS, and the Data Tamer data curation system. Presently he serves as Chief Technology Officer of Paradigm4 and Tamr, Inc.
Professor Stonebraker was awarded the ACM Turing Award in 2015 for his fundamental contributions to the concepts and practices underlying modern database systems.  He was also awarded the ACM System Software Award in 1992 for his work on INGRES. Additionally, he was awarded the first annual SIGMOD Innovation award in 1994, and was elected to the National Academy of Engineering in 1997. Finally, he was awarded the IEEE John Von Neumann award in 2005. He is presently an Adjunct Professor of Computer Science at M.I.T, where he is co-director of the Intel Science and Technology Center focused on big data.