Ihab Ilyas

Data cleaning is a machine learning problem that needs data systems help!

Big Data, Machine Learning, Systems

When dealing with real-world data, dirty data is the norm rather than the exception. We continuously need to predict correct values, impute missing ones, and find links between various data artefacts such as schemas and records. We need to stop treating data cleaning as a piecemeal exercise (resolving different types of errors in isolation), and […]

Mohamed Mokbel

Thinking Spatial

Databases, Recommendations, Spatial, Systems

Self-driving cars, ride-sharing service (e.g., Uber and Lyft), and Pokemon Go are just three examples of recent disruptive applications that gained huge market share and publicity. It is expected that each self-driving car will generate 2 PB of data per year, with 10 Million of such cars by 2020. Uber has 2+ Billion rides so […]

Stratos Idreos

Data systems that are easy to design*


We keep designing new data systems over and over again. We ask two questions in this post: 1) Is this sustainable in the long term? 2) Can we make this process faster? The need for new data system designs “Big data” may be mostly a marketing term as opposed to a research one but it […]

