A Tribute to José Alfredo Blakeley

Multiple Contributors

March 10, 2018

A Tribute to José Alfredo Blakeley

José Alfredo Blakeley, Partner Architect at Microsoft, passed away on January 7th, 2018. With this “tribute”, we would like to honor his many contributions to data management. José will be sorely missed as a great scientist, mentor, colleague, and friend.

José completed his undergraduate studies at Tecnológico de Monterrey, Mexico in 1978. He continued his studies in Computer Science at University of Waterloo, Canada, receiving an MMath degree in 1983 and PhD in 1987. After graduation he joined Indiana University, Bloomington, as a faculty member. He moved to Texas Instruments in 1989 where he worked on the development of an OODB. José joined Microsoft in 1994 and spent the remainder of his career in Redmond. His first project was OLE DB (a database access interface). He led the integration of the .NET Common Language Runtime (CLR) into SQL Server during 2003-2005 and was the lead architect of the Data Programmability Group during 2005-2007. His main focus over the last 10 years was building and shipping the SQL Server Parallel Data Warehouse (PDW) appliance (2007-2013) and the Azure Data Lake Analytics cloud service (starting in 2013). José has 19 granted patents. Over the years, he has served on many database research program committees, including as VLDB 2004 Industry PC Chair, ICDE 2008 PC Chair and VLDB 2011 General PC Co-Chair. He became an ACM Fellow in 2009.

PhD in Waterloo: Materialized Views

José arrived at Waterloo to begin work on his MMath degree in September 1981 under the direction of Frank Tompa, completing his coursework requirements and an essay entitled “The Design of an Electronic Telephone Directory.” He then embarked on his PhD program, choosing to work with both Paul Larson and Frank Tompa on database query processing and, more specifically, on how to maintain materialized views efficiently. His doctoral dissertation answers the following question: given the declaration of a select-project-join (SPJ) view, a database instance D, the corresponding view instance V, and an update U (either a set of tuples I to be inserted into the database, a selection condition S specifying the set of tuples to be deleted from the database, or a selection condition M specifying the set up tuples to be modified and how to modify them), how can V be most efficiently updated to reflect the new state of the database?

José’s contributions centered on several aspects of this problem:- Under what conditions can U be safely ignored because the specified insertion, deletion, or modification cannot cause any changes to V?

– If U cannot be ignored, under what conditions can the appropriate updates to V be determined a) without accessing base tables and data in V or b) with accessing data in V but not in base tables?

– In all cases, what is an efficient algorithm to compute the required updates to V and apply them?

At the time that José embarked on this research, others had begun to investigate how to use materialized views to make query processing more efficient, but little work addressed the question of how to incrementally update views after an update. His SIGMOD 1986 paper contains the fundamental algorithms for incremental view maintenance and is one of the classics in the area. His contributions in this area are documented in references [1 – 6]. The techniques used to maintain materialized views in all database systems (including data warehouses) have built upon this work. Today all major commercial systems support materialized views, but the first deployment emerged more than ten years after José’s pioneering work.

Professor at Indiana University

José spent two years in academia at Indiana University in Bloomington where he co-advised one PhD student, Anand Deshpande. At that time, José worked on two ideas. The first idea was to apply database abstractions (e.g., SQL and ACID transactions) to the design of operating systems. The second idea was a new way to implement nested relational database systems which became an important part of Anand’s PhD thesis. That idea also inspired work on path indexes and especially XML document databases more than a decade later.

Texas Instruments: Object Databases

Object-oriented databases (OODBs aka ODBs) were conceived in the mid-1980s to reduce the impedance mismatch between database and programming language data structures. The initial idea for OODBs was to seamlessly persist and share programming language data types. At Texas Instruments, a team of researchers explored OODBs to store CAD databases of circuits persistently. José joined this project in 1989 and, with his strong relational background, made the observation that persistence and query capabilities were orthogonal. That led him to propose that querying over persistent or transient CAD or other programming language data structures be supported. In 1990, José developed (prototyped, documented and patented) TI’s OODB’s query service OQL[X] where X refers to the data structures in a host programming language, e.g., C++. José was responsible for the language interface, query optimizer, and set execution runtime [7 – 9].

Subsequently, the work of the TI team also led to the DARPA-funded Open OODB project (1990-1995) which developed the first object-oriented DBMS built as a componentized system. Through two key contributions, the DARPA Open OODB architecture influenced the then-nascent middleware industry that blossomed into service-oriented architectures (SOA), today a mainstay of enterprise computing. First, the team authored an influential technical report in 1989 that was presented at early Object Management Group (OMG) meetings that influenced the OMG Object Management Architecture Guide‘s “Reference Model”, a bus architecture with services available on the bus [10 – 12]. Second, the DARPA Open OODB modules were each separately documented using a common template, effectively a design pattern (four years before that term came into general use via the Gang of Four’s famous text). At OMG, José and his colleagues contributed to OMG’s Object Service Architecture, a compendium of middleware design patterns that drove OMG standards related to CORBA Services throughout the 1990s. These service interfaces populated industry’s first widely useful service-oriented architecture [13].

José’s OODB work is synthesized in a book chapter that describes query processing in OODBs [14].

Microsoft: OLE DB / ADO

José joined Microsoft in 1994, as an architect to the OLE DB team, where he continued his passion for reducing the impedance mismatch between traditional databases and modern programming languages. José authored one of the early papers that formed the team’s strategy and vision, a must read for everyone who was hired into the team: “Data Access for the Masses through OLE DB” [15]. The overall premise was to allow applications (the masses) to query and reason about data wherever it resided, as opposed to requiring all data to be moved to a traditional database. This fundamental shift to UDA (universal data-access) was both complementary to the traditional view of “database at the center of the universe”, and allowed multiple heterogeneous data sources; flat files, sequential files, desktop databases, email, directory services, traditional database, and even the web with ADO integration (ActiveX Data Objects).

Most of the OLE DB and ADO teams knew José as the face of data-access (UDA), a key founding father of OLE DB, one who impacted the strategic direction of the team, investments, and customers. Others knew José as a key architect who they butted heads with, as he had strong convictions on the technology front, from the specification, to the programming surface (API), all the way to the codebase itself. Others knew José as a magnet for new hires, spending a lot of time recruiting the team, convincing them to come to Microsoft, and mentoring them once on board.

Many team members also got to know José as a friend who would routinely share fatherly advice. José always found the balance between work, making the world a better place, and family life.

.NET CLR Integration, ADO.NET Entity Frameworks

After OLE DB, José joined the SQL Server engine team as architect for the SQL CLR effort, to host the .NET Common Language Runtime (CLR) in Microsoft SQL Server. With the CLR hosted in Microsoft SQL Server, database developers can author stored procedures, triggers, user-defined functions, user-defined types, and user-defined aggregates in .NET managed code, providing safety and JIT (Just-in-time) compilation for performance [16].

In 2004, José came back to extend ADO to the next generation. He joined the ADO.NET group to raise the programming abstraction from relations to entity types in ADO.NET by incorporating Microsoft’s Entity Data Model (EDM). EDM is an extended relational model that treats entities and relationships as first class concepts, a query language for EDM, a comprehensive mapping engine that translates from the conceptual (entity) to the logical (relational) level, and model-driven tools that help define and maintain mappings. Collectively, these services are called the ADO.NET Entity Framework. The entity framework provided conceptual model footing for subsequent data programming efforts like OData and Microsoft Graph API [17]. It is widely used and still under active development as an open source project [20, 21].

Parallel Data Warehouse

In 2007 the data warehousing industry was shifting towards appliance solutions where hardware and MPP software would be engineered to work together. Microsoft was considering an acquisition to embrace the trend though there was risk and uncertainty. It was José’s technical understanding of the technology and his always optimistic outlook that gave us the encouragement to proceed with a deal. José dedicated many of the following years of his career to ensure the success of the project.

There was no end to the technical challenges the project had to overcome, and José left his positive mark all over the team and the technology. He helped attract talent, and personally spent time with each engineer showing them what great engineering looked like. He influenced technical choices in the core database system but also weighed in on the hardware platform including compute, networking and storage decisions.

Beyond technical contributions, José displayed leadership that will be remembered for many years to come. In times of pressure José provided a calm perspective, and in times where complacency threatened the quality of deliverables José held everyone to the highest standards. José sacrificed a significant amount of family time by flying down to Southern California every week for many years to ensure the success of the team and the project. In the end he succeeded, the technology shipped in appliance form known as Parallel Data Warehouse (PDW) and is also the foundation of Microsoft’s Azure Data Warehouse service.

The passion, perseverance, and dedication José displayed during these years will have a lasting impact and be remembered for decades to come.

Data Analytics in the Cloud: Azure Data Lake

Three disruptions impacted the database and data warehouse industry in the last decade: (a) the cloud, (b) new complex workloads, and (c) open-source systems such as Hadoop. Microsoft had an early response to these disruptions: A system called Cosmos was developed for internal use within Microsoft to build prediction models for Bing, improve service quality for Skype, and analyze the availability of Office 365, among many other scenarios. Cosmos has all the features that are needed to do complex analytics and machine learning on semi-structured data in the file system (e.g., logs). It has support for relational operators and user-defined functions. Furthermore, Cosmos scales to tens of thousands of machines and exa-bytes of data, per cluster. However, Cosmos was not designed from the ground up to host workloads and data from Microsoft customers.

In the last four years, José was an architect in Azure Data Lake Analytics team. Azure Data Lake Analytics is the service that brings the core Cosmos technology to Microsoft customers: It is an elastic and fully managed service that allows users to pay-as-they-go. It features a new powerful query language with user-defined functions, called U-SQL. It is fully compliant and secure and like Cosmos scales to large clusters and huge data sizes. José was a leader in exploring extensions to U-SQL. José was also passionate about the robustness and performance of the system.Until literally his last moments, he worked on a new scheduling algorithm [17] and a new benchmarking framework to study the performance and cost / response time & availability trade-offs of managed services like Azure Data Lake Analytics [18, 19]. He was constantly fighting fires and addressing scalability and capacity issues that arise in large-scale systems like Azure Data Lake. Last but not the least, he was a great recruiter and mentor to many people in the team.

Contributors

Philip Bernstein, Pedro Celis, Surajit Chaudhuri, Anand Deshpande, David DeWitt, Kent Foster, Cesar Galindo-Legaria, Christian Kleinerman, Donald Kossmann, Per-Ake Larson, David Lomet, Anil Nori, Frank Tompa, Tamer Özsu, Raghu Ramakrishnan, Michael Rys, Clemens Szyperski, Craig Thompson, Ed Triou, Dirk van Gucht

Footnotes

[1] José A. Blakeley: “Updating Materialized Database Views,” PhD Thesis, Department of Computer Science, University of Waterloo, 1987 (joint supervision: Per-Åke Larson and Frank Tompa).

[2] José A. Blakeley, Neil Coburn, and Per-Åke Larson: Updating Derived Relations: Detecting Irrelevant and Autonomously Computable Updates. VLDB Conference 1986: 457-466.

[3] José A. Blakeley, Per-Åke Larson, and Frank Wm. Tompa: Efficiently Updating Materialized Views. SIGMOD Conference 1986: 61-71.

[4] Frank Wm. Tompa and José A. Blakeley: Maintaining materialized views without accessing base data. Inf. Syst. 13(4): 393-406 (1988).

[5] José A. Blakeley, Neil Coburn, and Per-Åke Larson: Updating Derived Relations: Detecting Irrelevant and Autonomously Computable Updates. ACM Trans. Database Syst. 14(3): 369-400 (1989).

[6] José A. Blakeley and Nancy L. Martin: Join Index, Materialized View, and Hybrid-Hash Join: A Performance Analysis. ICDE Conference 1990: 256-263.

[7] José Blakeley, Craig Thompson: “Apparatus and Method for Adding an Associative Query Capability to a Programming Language,” Patent application filed April 1990, issued as U.S. Patent 5,761,493, June 1998, and U.S. Patent 5,826,077, October 1998.

[8] José Blakeley, Craig Thompson, Abdulah Alashqur: “Strawman Reference Model for Object Query Languages,” International Journal of Computer Standards and Interfaces, 1991.

[9] Craig Thompson, José Blakeley, David Wells: “Object Query Service,” OMG Documents 09-44, September 1994.

[10] Craig Thompson, José Blakeley, Tom Bannon, John Chen, Tom Ekberg, Steve Ford, Anil Gupta, J. Joseph, Edward Perez, Diana Sparacin, Robert Peterson, Mark Shadowens, Satish Thatte, Chung Wang, David Wells: “Open Architecture for Object-oriented Database Systems,” Texas Instruments Technical Report ITL 89-12-01, December 1989. OMG Document 1990/90-01-06.

[11] William Andreas, Goeff Lewis, Matthew Mathews, Lee Scheffler, R. Soley, Craig Thompson: “Reference Model,” Object Management Architecture Guide, OMG Document 1990/90-09-01.

[12] David Wells, José Blakeley, Craig Thompson: “Architecture of an Open Object-oriented Database Management System,” IEEE Computer, Special Issue on Object-Oriented Applications: 74-82(1992).

[13] “OMG Object Services Architecture V8.0,” OMG, December 1994.

[14] M.Tamer Özsu and José Blakeley, “Query Optimization and Processing in Object-Oriented Database Systems,” In Modern Database Management – Object-Oriented and Multidatabase Technologies, W. Kim (ed.), Addison-Wesley/ACM Press, 1994, pages 146-174.

[15] José Blakeley: “Data Access for the Masses through OLE DB,” SIGMOD Conference: 161-172 (1996).

[16] Alazel Acheson, Mason Bendixen, José Blakeley, Peter Carlin, Ebru Ersan, Jun Fang, Xiaowei Jiang, Christian Kleinerman, Balaji Rathakrishnan, Gideon Schaller, Beysim Sezgin, Ramachandran Venkatesh, Honggang Zhang: “Hosting the .NET Runtime in Microsoft SQL Server,” SIGMOD Conference 2004: 860-865.

[17] José Blakeley, S. Muralidhar, Anil Nori: “The ADO.NET Entity Framework: Making the Conceptual Level Real,” ER Conference 2006: 552-565.

[18] Zhicheng Yin, Jin Sun, Ming Li, Jaliya Ekanyake, Haibo Lin, Marc Friedman, José Blakeley, Clemens Szyperski, Nikhil Devanur: “Bubble Execution: Resource-aware Reliable Analytics at Cloud Scale,” to appear in Proceedings of the VLDB 2018.

[19] Umar Farooq Minhas, José Blakeley, Donald Kossmann, Raghu Ramakrishnan, Clemens Szyperski: “Benchmarking Cloud-based Big Data Analytics Services,” in preparation, 2018.

[20] https://github.com/aspnet/EntityFrameworkCore
[21] https://github.com/aspnet/EntityFramework6

1,631 views

Courting ML: Witnessing the Marriage of Relational & Web Data Systems to Machine Learning On Data Exploration in the era of Big Data

2 Comments

Peter Chen on March 12, 2018

It is very sad our community has lost a great scientist and colleague. Jose was very innovative and had been very successful in implementing new ideas into commercial products. He will be missed by many of us.
Gio Wiederhold on March 12, 2018

Sad to hear, too young.
I was the program manager at DARPA supporting TI’s OODB work.
One of the many efforts influencing today’s focus in data.

Comments are closed

Multiple Contributors