Archive for the Big Data category

Ihab Ilyas
Ihab Ilyas

Data cleaning is a machine learning problem that needs data systems help!

Big Data, Machine Learning, Systems

When dealing with real-world data, dirty data is the norm rather than the exception. We continuously need to predict correct values, impute missing ones, and find links between various data artefacts such as schemas and records. We need to stop treating data cleaning as a piecemeal exercise (resolving different types of errors in isolation), and […]

Read more
Melanie Herschel and Yannis Velegrakis
Melanie Herschel and Yannis Velegrakis

On Data Exploration in the era of Big Data

Big Data, data exploration, Interview

We are witnessing data of unprecedented volume, variety and velocity. Such data is collected from almost every aspect of human activity and stored in large repositories in order to be later analyzed and turned into useful insights. The storage model is not any more the one in which data is placed in predefined structures with […]

Read more
Azza Abouzied and Paolo Papotti
Azza Abouzied and Paolo Papotti

Courting ML: Witnessing the Marriage of Relational & Web Data Systems to Machine Learning

Big Data, Databases, Interview, Machine Learning

The web is an ever-evolving source of information, with data and knowledge derived from it powering a great range of modern applications. Accompanying the huge wealth of information, web data also introduces numerous challenges due to its size, diversity, volatility, inaccuracy, and contradictions. This year’s WebDB 2018 theme emphasizes the challenges and opportunities that arise […]

Read more
Surajit Chaudhuri
Surajit Chaudhuri

Approximate Query Processing – Where do we go from here?

Big Data, Query Processing

I think we need to take a hard look at approximate query processing. Don’t get me wrong. The vision of approximate query processing is indeed compelling. In the age of Big Data, the ability to answer analytic queries “approximately”, but at a fraction of the cost of executing the query in the traditional way, is […]

Read more
Vijay Srinivas Agneeswaran
Vijay Srinivas Agneeswaran

Google Spanner: Beginning of the End of the NoSQL World?

Big Data, Databases

Google has recently announced that its flagship wide-area database named Spanner has been made available on the Google Cloud. Google Spanner is the next generation globally-distributed database built inside Google and announced to the world through the paper published in OSDI 2012 [1]. This article explores the implication of Google Spanner, in particular to the […]

Read more

Categories