Mahsa Baktash and Zi (Helen) Huang

A Leap from Model-Centric to Data Centric AI

Data as a major component of a deep learning solution is often undervalued in the ML projects, which results in a lower-than-expected accuracy, requiring hours and hours of model tuning. According to Andrew Ng, 99% of the recent publications are model-centric with only 1% being data-centric. He argues that there should be a balance between […]

Arun Kumar

Automation of Data Prep, ML, and Data Science: New Cure or Snake Oil?

For almost 30 years, the DB / data management community has intensively studied the vexing pains of data integration, cleaning, and transformation. This research has largely been in the contexts of RDBMSs, SQL-oriented business intelligence (BI), and knowledge base construction. But as the emerging interdisciplinary field of Data Science gains prominence, the massive pain of […]

Yunyao Li and Shivakumar Vaithyanathan

Role of AI in Enterprise Applications

The recent return of AI summer and the enthusiastic uptake of AI in the commercial world can be loosely attributed to three innovations: Apple’s Siri, Google’s self-driving cars, and IBM Watson Jeopardy. This enthusiasm stems from the belief that AI will influence a wide range of applications across multiple industry segments. While such enthusiasm is, […]

Arun Kumar

ML/AI Systems and Applications: Is the SIGMOD/VLDB Community Losing Relevance?

Overview of DEEM 2018 The ACM SIGMOD Second Workshop on Data Management for End-to-End Machine Learning (DEEM) was successfully held last June in Houston, TX. The goal of DEEM is to bring together researchers and practitioners at the intersection of applied machine learning (ML) and data management/systems research to discuss data management/systems issues in ML […]

Ihab Ilyas

Data cleaning is a machine learning problem that needs data systems help!

When dealing with real-world data, dirty data is the norm rather than the exception. We continuously need to predict correct values, impute missing ones, and find links between various data artefacts such as schemas and records. We need to stop treating data cleaning as a piecemeal exercise (resolving different types of errors in isolation), and […]

