Date: Thursday, June 26, 2025
Organizers: Eugene Wu, Raul Castro Fernandez
Panelists:
As organizers, we aimed to foster a bottom-up discussion on the evolving role and future direction of academic database research. Motivated by seismic technological and budgetary shifts, particularly the rise of AI, we explored the comparative advantage of the academic database community. This advantage is critical because it dictates our ability to recruit top students, ensure their success after graduation, and maintain our relevance in a rapidly changing technology and economic landscape. We specifically emphasized that “we” refers to academic database research, contextualizing the topic around recent shifts in government funding ($4B NSF expected budget for 2026) compared to the rapid growth of private investment in AI (e.g., >$100B VC funding in 2024, >$50B in Q1 2025), as funding directly influences the talent landscape.
The feedback for the panel and its structure was as varied as the recommendations themselves. Many approached us afterward to share their excitement in the panel’s discourse and for surfacing their worries and challenges. Others felt the existence of such a panel was yet another in a “sorry tradition of needless negativity” for a vibrant and thriving research community. With a community as big as ours, we are grateful to receive these diverse opinions.
Matthew Butrovich and Lampros Flokas kindly took notes. The quoted text below is pulled from these notes, so we do not attribute them to specific individuals. The post ends with reflections from the organizers.
If this topic is of interest to you, we will be running a companion panel at VLDB 2025 with a different set of panelists, including Shreya Shankar, Gustavo Alonso, Natacha Crooks, Jiannan Wang, and Divesh Srivastava! We have updated the ArXiv paper to reflect this summary and the panelists’ takes. Comments and hot takes that you would like to include in the paper are welcome at ewu@cs.columbia.edu and raulcf@uchicago.edu.
The discussion revolved around two primary topics: the competitive advantage of the academic database research community and actionable strategies for its future.
The first topic focused on the academic database community’s competitive advantage as compared to other disciplines, industry, and startups.
Data Principles and AI: Our community’s core database principles, such as “independence between physical and logical” and “declarativeness”, and “automatic scalability hold lasting value. The CTO of Alibaba Cloud demonstrated this by applying query optimization principles to rebuild their pipeline for training QWEN 3, significantly reducing costs. We believe “all AI problems boil to data,” and our deep systems knowledge offers unique insights into challenges like data provenance, security, and novel data abstractions for AI workloads. A panelist noted the “bug” of companies needing hundreds of engineers for each of their separate “AI stack” and “data stack” teams, highlighting a significant opportunity for unified solutions rooted in database thinking.
While some external communities might prioritize immediate problem-solving and “accuracy” over these foundational principles, WE must effectively communicate their long-term benefits. “The way we interact with other communities is problematic” and “we don’t communicate what we have in the right away.”
Navigating the Modern Data Landscape Beyond the Traditional DBMS: “DBMSes are fast enough” for most applications, and our community’s expertise is vital for new data problems beyond the classic RDBMS. For instance, many performance problems are due to the ORM and never arise at the DBMS, while societal-level issues like fake news with an ongoing wave of AI-generated content (“AI slop”) is an information integrity problem. A panelist pointed out that a MacBook can comfortably run TPC-H scale factor 1000: “small data” is enough for most applications and challenges lie in programmability, interoperability, and usability. Conversely, an audience member argued that improving the performance of RDBMSes can also be valuable as throughput increases may unlock new types of applications.
In the AI world, “solutions are crappy when you” combine diverse workloads like vectors, keywords, and relational queries in commercial systems. The platform for building data management systems is “radically different” at “cloud scale,” involving “smart NICs, disaggregated memory that comes and goes, [and] GPU spending.” These new hardware environments present unique opportunities for our deep systems knowledge.
Academic Research Identity: Balancing Curiosity, Impact, and Resources: Academia holds a unique position to pursue “curiosity driven research,” which contrasts with industry’s often shorter-term, product-focused cycles. This allows for “academic play” that breaks free from the need to make numbers go up. Panelists observed that “industry thinks 3/4 quarters ahead” while “Academia thinks 5 years ahead.” Despite industry’s “muscle” (e.g., 600 people on projects compared to “2 students for 2 years” in academia), academia retains the freedom to explore complex, foundational questions without immediate commercial constraints. This implies that academia should not directly compete with industry; as Surajit mentioned in the Tuesday panel, researchers should look for high upside when choosing a problem.
The second half of the panel shifted towards actionable take-aways for the community.
Bring the Data (Researcher) To the Problem: “The way we interact with other communities is problematic” and we expect other communities to come to us. Instead, 1/3 of research sessions in other conferences have ‘data’ in their titles and “these communities are not going to come to SIGMOD”. A key recommendation was to proactively engage with other academic disciplines. The panel stressed the need to “go out” to these communities, and that “inter-disciplinary research post tenure” could be “tremendously fulfilling”. One suggestion was to create “Data + X” workshops in the X communities that are run in a “true discussion mode not as a paper presentation”.
Win the Heart of Data Stakeholders: Despite the lasting value our core database principles, such as “independence between physical and logical” and “declarativeness,” hold, we should not waste our efforts convincing other CS disciplines that these matter. We should direct our efforts toward winning the hearts of data stakeholders. How do we do that? By showing how good we are at treating everything as data and building the data backbone of their needs.
Improve Dissemination and Communication: There was a strong call to make academic research more accessible and understandable, as “nobody reads our papers, but they want to talk to someone who reads our papers”. It was suggested that papers are “extremely hard to read,” and the community “let[s] others pull the knowledge”. Recommendations included writing “better tutorials” and producing “YouTube videos”, and reducing SIGMOD papers to “8 pages” accompanied by “a clean and fun youtube video”.
Foster Industry-Academia Collaboration: Encouraging students to spend time in industry was highlighted as a way to expose academic research to real-world challenges and bring practical problems back. It was noted that “OpenAI doesn’t come here, but they bought a database” company, Rockset. Their “head of engineering is from Almaden. Send students into industry for a bit after teaching them systems.”
Rethink Conferences and Publication Norms: The panel discussed the need to innovate scholarly communication and conference formats. SIGMOD is a leader in pushing scholarly communication forward (e.g., we introduced reproducibility efforts), and there are ongoing discussions at the SIGMOD and ACM levels about reorganizing conference formats to better support “conferring” over listening to “canned presentations”. Continuing the idea of academic play, a panelist remarked that our reviewing process may over-index on work that must be faster than some baseline, and inadvertently prune the early seeds of ideas that may not lead to impressive results now but open up impactful research directions in the long run.
Strategically Focus on Foundational Principles in New Contexts: Panelists argued that we need not “become experts in LLMs” or other domains. Instead, our strength lies in forming “multi-disciplinary teams” where each contributes their core expertise. Specifically, for AI, “LLMs are a race to 0. It’s a tool.” Instead, “the application and the data is interesting and where the money is”. The community’s advantage lies in knowing “complexities of modern technology like GPUs” and continually learning.
The panelists were asked to imagine they were “president of database research” and could unilaterally force one change.
Overall, we came away from the panel excited that our community has so much impact, and possesses the data-centric tools that the world needs! However, our future impact hinges on proactively engaging with a rapidly evolving landscape of applications, evangelizing our expertise to other communities, and strategically collaborating across domains. All great motivations for academic research!
The positive and insightful discussion ultimately led to recommendations that exhibit inherent trade-offs, rather than a single, clear set of recommendations:
In either case, we do feel that problems should be paired with solutions. We plan to re-structure the upcoming panel at VLDB 2025 in late August to delve deeper into concrete trade-offs in the above recommendations. If you have suggestions on how to organize the VLDB panel, or would like to share your take on the topic or panel discussion, please share them with ewu@cs.columbia.edu and raulcf@uchicago.edu.
We again thank the panelists, the audience members, people that shared feedback, and Matt Butrovich and Lampros Flokas for taking notes.