Archive for April, 2018

Ihab Ilyas

Data cleaning is a machine learning problem that needs data systems help!

Big Data, Machine Learning, Systems

When dealing with real-world data, dirty data is the norm rather than the exception. We continuously need to predict correct values, impute missing ones, and find links between various data artefacts such as schemas and records. We need to stop treating data cleaning as a piecemeal exercise (resolving different types of errors in isolation), and […]

Read more