November 14, 2018
The recent return of AI summer and the enthusiastic uptake of AI in the commercial world can be loosely attributed to three innovations: Apple’s Siri, Google’s self-driving cars, and IBM Watson Jeopardy. This enthusiasm stems from the belief that AI will influence a wide range of applications across multiple industry segments. While such enthusiasm is, […]
Read moreOctober 9, 2018
After being largely neglected in the rush to capitalize on the promise and the potential of Big Data, data privacy and data stewardship issues have resurfaced in industry with a vengeance over the last year. This has been driven, in part, by the increased scrutiny by regulatory bodies all over the world and subsequent legislations, […]
Read moreAugust 21, 2018
Overview of DEEM 2018 The ACM SIGMOD Second Workshop on Data Management for End-to-End Machine Learning (DEEM) was successfully held last June in Houston, TX. The goal of DEEM is to bring together researchers and practitioners at the intersection of applied machine learning (ML) and data management/systems research to discuss data management/systems issues in ML […]
Read moreJune 25, 2018
Information visualization is an essential tool in the arsenal of a data scientist: visualizations help identify trends and patterns, spot outliers and anomalies, and verify hypotheses. Moreover, visualizations are visceral and intuitive: they tell us stories about our data; they educate, delight, inform, enthrall, amaze, and clarify. This has led to the overwhelming popularity of […]
Read moreApril 18, 2018
When dealing with real-world data, dirty data is the norm rather than the exception. We continuously need to predict correct values, impute missing ones, and find links between various data artefacts such as schemas and records. We need to stop treating data cleaning as a piecemeal exercise (resolving different types of errors in isolation), and […]
Read more