Eugene Wu

Panel Summary: Where Does Academic Database Research Go From Here?

Uncategorized

Date: Thursday, June 26, 2025 

Organizers: Eugene Wu, Raul Castro Fernandez 

Panelists: 

  • Dan Suciu: Microsoft Endowed Professor in CSE at the University of Washington.
  • Sihem Amer-Yahia: Research Director, CNRS, LIG; VP of VLDB Endowment.
  • Yannis Ioannidis: Professor of Informatics & Telecom at the University of Athens, President of the ACM.
  • Ippokratis Pandis: Distinguished Engineer at Databricks.
  • Jens Dittrich, Professor of CS at Saarland University; 3x CIDR gong show winner.

Overview

As organizers, we aimed to foster a bottom-up discussion on the evolving role and future direction of academic database research. Motivated by seismic technological and budgetary shifts, particularly the rise of AI, we explored the comparative advantage of the academic database community. This advantage is critical because it dictates our ability to recruit top students, ensure their success after graduation, and maintain our relevance in a rapidly changing technology and economic landscape. We specifically emphasized that “we” refers to academic database research, contextualizing the topic around recent shifts in government funding ($4B NSF expected budget for 2026) compared to the rapid growth of private investment in AI (e.g., >$100B VC funding in 2024, >$50B in Q1 2025), as funding directly influences the talent landscape.

The feedback for the panel and its structure was as varied as the recommendations themselves. Many approached us afterward to share their excitement in the panel’s discourse and for surfacing their worries and challenges. Others felt the existence of such a panel was yet another in a “sorry tradition of needless negativity” for a vibrant and thriving research community.   With a community as big as ours, we are grateful to receive these diverse opinions.

Matthew Butrovich and Lampros Flokas kindly took notes.  The quoted text below is pulled from these notes, so we do not attribute them to specific individuals.   The post ends with reflections from the organizers.

If this topic is of interest to you, we will be running a companion panel at VLDB 2025 with a different set of panelists, including Shreya Shankar, Gustavo Alonso, Natacha Crooks, Jiannan Wang, and Divesh Srivastava!    We have updated the ArXiv paper to reflect this summary and the panelists’ takes.    Comments and hot takes that you would like to include in the paper are welcome at ewu@cs.columbia.edu and raulcf@uchicago.edu.

Panel Summary

The discussion revolved around two primary topics: the competitive advantage of the academic database research community and actionable strategies for its future.

Our Competitive Advantage

The first topic focused on the academic database community’s competitive advantage as compared to other disciplines, industry, and startups.

Data Principles and AI: Our community’s core database principles, such as “independence between physical and logical” and “declarativeness”, and “automatic scalability hold lasting value. The CTO of Alibaba Cloud demonstrated this by applying query optimization principles to rebuild their pipeline for training QWEN 3, significantly reducing costs. We believe “all AI problems boil to data,” and our deep systems knowledge offers unique insights into challenges like data provenance, security, and novel data abstractions for AI workloads. A panelist noted the “bug” of companies needing hundreds of engineers for each of their separate “AI stack” and “data stack” teams, highlighting a significant opportunity for unified solutions rooted in database thinking.

While some external communities might prioritize immediate problem-solving and “accuracy” over these foundational principles, WE must effectively communicate their long-term benefits. “The way we interact with other communities is problematic” and “we don’t communicate what we have in the right away.”

Navigating the Modern Data Landscape Beyond the Traditional DBMS: “DBMSes are fast enough” for most applications, and our community’s expertise is vital for new data problems beyond the classic RDBMS. For instance, many performance problems are due to the ORM and never arise at the DBMS, while societal-level issues like fake news with an ongoing wave of AI-generated content (“AI slop”) is an information integrity problem. A panelist pointed out that a MacBook can comfortably run TPC-H scale factor 1000: “small data” is enough for most applications and challenges lie in programmability, interoperability, and usability. Conversely, an audience member argued that improving the performance of RDBMSes can also be valuable as throughput increases may unlock new types of applications.

In the AI world, “solutions are crappy when you” combine diverse workloads like vectors, keywords, and relational queries in commercial systems. The platform for building data management systems is “radically different” at “cloud scale,” involving “smart NICs, disaggregated memory that comes and goes, [and] GPU spending.” These new hardware environments present unique opportunities for our deep systems knowledge.

Academic Research Identity: Balancing Curiosity, Impact, and Resources: Academia holds a unique position to pursue “curiosity driven research,” which contrasts with industry’s often shorter-term, product-focused cycles. This allows for “academic play” that breaks free from the need to make numbers go up. Panelists observed that “industry thinks 3/4 quarters ahead” while “Academia thinks 5 years ahead.” Despite industry’s “muscle” (e.g., 600 people on projects compared to “2 students for 2 years” in academia), academia retains the freedom to explore complex, foundational questions without immediate commercial constraints. This implies that academia should not directly compete with industry; as Surajit mentioned in the Tuesday panel, researchers should look for high upside when choosing a problem.

What the Academic Community Should Do

The second half of the panel shifted towards actionable take-aways for the community.

Bring the Data (Researcher) To the Problem: “The way we interact with other communities is problematic” and we expect other communities to come to us. Instead, 1/3 of research sessions in other conferences have ‘data’ in their titles and “these communities are not going to come to SIGMOD”. A key recommendation was to proactively engage with other academic disciplines. The panel stressed the need to “go out” to these communities, and that “inter-disciplinary research post tenure” could be “tremendously fulfilling”. One suggestion was to create “Data + X” workshops in the X communities that are run in a “true discussion mode not as a paper presentation”.

Win the Heart of Data Stakeholders: Despite the lasting value our core database principles, such as “independence between physical and logical” and “declarativeness,” hold, we should not waste our efforts convincing other CS disciplines that these matter. We should direct our efforts toward winning the hearts of data stakeholders. How do we do that? By showing how good we are at treating everything as data and building the data backbone of their needs.  

Improve Dissemination and Communication: There was a strong call to make academic research more accessible and understandable, as “nobody reads our papers, but they want to talk to someone who reads our papers”. It was suggested that papers are “extremely hard to read,” and the community “let[s] others pull the knowledge”. Recommendations included writing “better tutorials” and producing “YouTube videos”, and reducing SIGMOD papers to “8 pages” accompanied by “a clean and fun youtube video”.

Foster Industry-Academia Collaboration: Encouraging students to spend time in industry was highlighted as a way to expose academic research to real-world challenges and bring practical problems back. It was noted that “OpenAI doesn’t come here, but they bought a database” company, Rockset. Their “head of engineering is from Almaden. Send students into industry for a bit after teaching them systems.”

Rethink Conferences and Publication Norms: The panel discussed the need to innovate scholarly communication and conference formats. SIGMOD is a leader in pushing scholarly communication forward (e.g., we introduced reproducibility efforts), and there are ongoing discussions at the SIGMOD and ACM levels about reorganizing conference formats to better support “conferring” over listening to “canned presentations”.   Continuing the idea of academic play, a panelist remarked that our reviewing process may over-index on work that must be faster than some baseline, and inadvertently prune the early seeds of ideas that may not lead to impressive results now but open up impactful research directions in the long run.

Strategically Focus on Foundational Principles in New Contexts: Panelists argued that we need not “become experts in LLMs” or other domains. Instead, our strength lies in forming “multi-disciplinary teams” where each contributes their core expertise. Specifically, for AI, “LLMs are a race to 0. It’s a tool.” Instead, “the application and the data is interesting and where the money is”. The community’s advantage lies in knowing “complexities of modern technology like GPUs” and continually learning.

Closing Comments

The panelists were asked to imagine they were “president of database research” and could unilaterally force one change.

  • “Go out and create workshops that are ‘data + X’ or ‘data 4 X’, have them run in a true discussion mode not as a paper presentation with two questions. Interactive panels and a few keynotes by experts. Run interdisciplinary workshops and everyone is forced to go to at least one of them”
  • “Force everybody to send students to industry for a bit and bring problems back. security, provenance, data integration. Force a change of mind that academia is the holy land and let students be more open to industry.”
  • “Impose that SIGMOD papers be reduced to 8 pages. accompanied by a clean and fun youtube video.”
  • “Help people build real applications and make it open source.”

Reflections From the Organizers

Overall, we came away from the panel excited that our community has so much impact, and possesses the data-centric tools that the world needs!    However, our future impact hinges on proactively engaging with a rapidly evolving landscape of applications, evangelizing our expertise to other communities, and strategically collaborating across domains.   All great motivations for academic research!

The positive and insightful discussion ultimately led to recommendations that exhibit inherent trade-offs, rather than a single, clear set of recommendations:

  • Is Database Research Art?  We should leverage academia’s unique capacity to pursue curiosity-driven research (academic play), but a real and present budget crisis exists. How do we balance these two imperatives? How can academic research secure sustainable funding for foundational, long-term problems that industry cannot or will not pursue, without becoming irrelevant to the capitalist marketplace that attracts top talent and resources?   It reminds us of artists that strive to pursue purity in expression but face economic realities.
  • Impactful Research vs. Market Dominance: We should do impactful research that is neither too short-term nor too long-term, yet offers high upside. However, the relational database market completely dwarfs any alternative data applications and domains. How do we balance the dominance of existing database technologies with the need to explore new paradigms that may not have immediate commercial application but might offer long-term, high-upside impact?
  • Go Big and Stay Real: We should talk to other domains and experts to be “trans-disciplinary.” But how do we initiate such projects effectively? What specific models for “multi-disciplinary teams” can integrate deep database knowledge into diverse fields (e.g., AI, quantum computing, social sciences) to solve “data problems” without sacrificing the rigor and identity of database research?
  • Engineering vs Science: We encourage students to engage with industry to gain real-world experience, but we also aim to conduct long-term, ambitious scientific research. How do we ensure that industry exposure enriches, rather than detracts from, fundamental scientific inquiry? On the other hand, data management is an applied field, so does it matter?

In either case, we do feel that problems should be paired with solutions. We plan to re-structure the upcoming panel at VLDB 2025 in late August to delve deeper into concrete trade-offs in the above recommendations. If you have suggestions on how to organize the VLDB panel, or would like to share your take on the topic or panel discussion, please share them with ewu@cs.columbia.edu and raulcf@uchicago.edu.

We again thank the panelists, the audience members, people that shared feedback, and Matt Butrovich and Lampros Flokas for taking notes.

Facebooktwitterlinkedin
676 views

Categories