December 29, 2022
Similarity search is a fundamental building block for a myriad of critical data science applications involving large collections of high-dimensional objects, including data discovery, data cleaning, information retrieval, classification, outlier detection and clustering. Similarity search finds objects in a collection close to a given query according to some definition of sameness. This challenging problem has […]
Read moreApril 17, 2022
Data as a major component of a deep learning solution is often undervalued in the ML projects, which results in a lower-than-expected accuracy, requiring hours and hours of model tuning. According to Andrew Ng, 99% of the recent publications are model-centric with only 1% being data-centric. He argues that there should be a balance between […]
Read moreJanuary 26, 2022
Data Science for Social Good, or DSSG, broadly refers to the use of data engineering and analysis solutions in the social work domain. I am interested in this field, because it gives me a chance to understand how database technologies can be used in a domain whose data-driven approaches are only in its infancy. Moreover, […]
Read more