Themis Palpanas

A Brief History of Data Series Indexing: from Time Series to High-Dimensional Vectors and Deep Neural Network Embeddings 

Data Series

In this post, we motivate the need for efficient and effective solutions for data series similarity search, and we briefly present the work that has been done in this direction by the data series community. We also discuss the relationship to high-dimensional (high-d) vectors and deep neural network embeddings, point to the relevant efforts in […]

Read more
Azza Abouzied

From Curbing Epidemics to Investing: We can Help!

Decision Making, Epidemics

We built a tool, EpiPolicy, to help policy-makers better plan interventions to combat epidemics [13]. It was an eye-opening experience, where through collaborations and interviews with teams of epidemiologists, public health officials, and economists, we understood some of the complexities of decision-making on a momentous scale. Decisions and policies made by these teams can seriously […]

Read more
Kian-Lee Tan

Data Management For The Metaverse

Databases

In 2009, we wrote an article highlighting some database challenges in a co-space environment [1]. In such an environment, the physical space and the digital space co-exist in a “universe” and applications can manipulate the data flow within and across the two spaces. 13 years have since passed and progress on co-space research has been […]

Read more
Sebastian Link

Data-quality Driven Design of Databases

Big Data, Databases

Financially, poor data quality costs organizations some ludicrous amounts of money. Worse, poor data quality is a strong inhibitor to the success of data science: No analytical method can create value from poor quality data. As a consequence, data science projects invest a majority of their resources on cleansing data. However, cleansing resists automation as […]

Read more
Mahsa Baktash and Zi (Helen) Huang

A Leap from Model-Centric to Data Centric AI

Data Science, Machine Learning

Data as a major component of a deep learning solution is often undervalued in the ML projects, which results in a lower-than-expected accuracy, requiring hours and hours of model tuning. According to Andrew Ng, 99% of the recent publications are model-centric with only 1% being data-centric. He argues that there should be a balance between […]

Read more

Categories