Philip A. Bernstein

Systems & Databases: Let’s Break Down the Walls

Databases
After hanging out exclusively with the database community for 35+ years, I’ve recently become more involved with the systems research community. I have a few observations and recommendations to share.Much of the work published in systems conferences covers topics that would have a natural home in database conferences. For example, transactions and data streams are currently in vogue in the systems field. Here are some recent Best-Paper-award topics: data streams at SOSP 2013, data-parallel query processing at OSDI 2008, and transactions OSDI 2012 and SOSP 2007. Although this topic overlap is increasing lately, it is not just a recent phenomenon. In the last 10+ years, the systems community has taken an interest in data replication, fault tolerance, key-value stores, data-parallel processing (i.e., map-reduce), query processing in sensor networks, and record caching. Some of this work has had a major impact on the database field, e.g., map-reduce and multi-master replication.Yet despite this overlap of topics, there has been remarkably little attendance of database researchers at systems conferences or of systems researchers at database conferences. Not a good thing. And until 2013, I’ve been part of the problem.

To help break down this barrier, SIGMOD and SIGOPS have sponsored the annual Symposium on Cloud Computing (SoCC) since 2010. Personally, I’ve found attending SoCC’s to be time well spent, and I had a good experience attending ICDCS 2013 in July. So I decided to attend my first SOSP in October, 2013. It was great. The fraction of that program of direct interest to me was as large as any database conference, including work on transactions, data streams, replication, caching, fault tolerance, and scalability. I knew there wouldn’t be a huge number of database attendees, but I wasn’t prepared to be such an odd duck. At most a dozen attendees out of 600 were card-carrying members of the database community. With few database friends to hang out with, I got to meet a lot more new people than I would at a database event and hear about projects in my areas that I’d wouldn’t otherwise have known about. Plus, I could advertise my own work to another group of folks.

If you do systems-oriented database research, then how about picking one systems conference to attend this year? NSDI in Seattle in April, ICDCS in Madrid in late June, or OSDI in Colorado in October. And if you see a systems researcher looking lonely at a database conference, then strike up a conversation. If they feel welcome, perhaps they’ll tell others and we’ll see more systems attendees at database events.

Given the small overlap of attendees in systems and database conferences, we shouldn’t be surprised that the communities have developed different styles of research papers. Since systems papers typically describe mechanisms lower on the stack, they often focus on broader usage scenarios. They value the “ities” more than database papers, e.g., scalability, availability, manageability, and security. They favor simple ideas that work robustly over complex techniques that report improvements for some inputs. They expect a paper to work through the system details in a credible prototype, and to report on lessons learned that are more broadly applicable. They expect more micro-benchmarks that explain the source of performance behavior that’s observed, rather than just benchmarks that model usage scenarios, such as TPC.

From attending systems conferences, serving on SoCC program committees (PC’s), and submitting an ill-fated paper to a systems conference, I learned a few things about systems conferences that we database folks could learn from:

1. More of their conferences are single-track, and they leave more time for Q&A. This leads to more-polished presentations. When you’re presenting a paper to 600 attendees, doing it badly is a career-limiter.

I’ve always liked CIDR, not only because of its system-building orientation, but also because it’s single-track. I’m forced to attend sessions on topics I normally would ignore, and hence I learn more. I think we should try making one of our big conferences single-track: SIGMOD, VLDB or ICDE. One might argue that too many excellent papers are submitted, so it’s impractical. I disagree. Here’s one way to do it: Have the same acceptance rate as in the past, and all papers are published as usual. However, the PC selects only one-track’s worth of papers for presentation slots. The other papers are presented in poster sessions that have no competing parallel sessions. The single-track enables us to learn about more topics, the density of great presentations is higher, and the poster sessions enable us to dig deep on papers of interest to us (as some of our DB conferences already enable us to do). Unless our field stops growing, we really have to do something like this. Our conferences have already gotten out of hand with as many as seven parallel sessions.

2. When describing related work, systems papers cast a wider net. Their goal is often not simply to demonstrate that the paper’s contributions are novel, but also to educate the reader about lines of work that are loosely related to the paper’s topic. To help enable this breadth, they’ve recently settled on a formula where references don’t count toward the paper’s maximum page count. We should adopt this view in the database community. One self-serving benefit is that papers would cite more references, which would increase all of our citation counts.

3. There’s a widespread belief that systems PC’s write longer, more constructive reviews. I think DB PC’s have been improving in this respect, but on average we’re not up to the systems’ standard.

4. They usually shepherd all papers, to ensure that authors incorporate recommended changes. This also gives the authors someone to ask about ambiguous recommendations. We should do this too.

5. They typically require 10pt font. As a courtesy to those of us over a certain age, whose eyes aren’t what they used to be, we should do this, and increase the page count to maintain the same paper length.

6. At SOSP and OSDI, they post all slide decks and videos of all presentations. Obviously worthwhile. We sometimes post slide decks and videos only for plenaries. Let’s do better.

There are a few aspects of systems conferences that I’m less enthusiastic about.

1. Systems conferences favor live PC meetings. They do have benefits. It’s educational for PC members to hear discussions of papers they didn’t review and for young researchers to see how decisions are reached. Since all PC members hear every paper discussion, non-reviewers can bring up points that neutralize inappropriate criticisms or raise issues that escaped reviewers’ attention, which reduces the randomness of decisions. However, there are also negatives. In borderline cases, the most articulate, quick-thinking, and extroverted PC members have an inappropriate advantage in getting their way. And inevitably, some PC members can’t attend the meeting, due to a schedule conflict or the travel expense, so the papers they reviewed get short shrift.

On balance, I don’t think the decisions produced by live PC meetings are enough better than on-line discussions to be worth what they cost. Perhaps we can get some of the benefits of a PC meeting by allowing PC members to see the reviews and discussions of all papers for which they don’t have a conflict, late in the discussion period (which we did some years ago). With large PC’s, this might constitute public disclosure, in which case patents would have to be filed before submission, which will delay the publication of some work. But if we think a broader vetting of papers is beneficial, this might be worth trying.

2. During the reviewing process of a systems conference, if a paper seems borderline, then the PC chairs solicit more reviews. These reviews do offer a different perspective on the interest-value of the work. But they are usually not as detailed as the initial ones and may be by less-expert reviewers. It’s common to receive 5 or 6 reviews of a paper submitted to a systems conference. As an author it’s nice to get this feedback. And it might reduce the randomness of decisions. But it significantly increases the reviewing load on PC’s. In database PC’s, we rarely, if ever, do this. In my opinion, it’s usually better to hold reviewers’ feet to the fire and make them decide. Still, we’d benefit from getting 4 or 5 reviews more often than never, but not as often as systems conferences.

3. Systems conferences publish very few industry papers and rarely have an industry track. In my experience, an industry paper is judged just like a research paper, but with a somewhat lower quality bar. This is unfortunate. Researchers and practitioners benefit from reading about the functionality and internals of state-of-the-art products, even if they are only modestly innovative, especially if they are widely used. The systems community would serve their audience better by publishing more such papers.

4. The reviewing load for systems PC’s is nearly double that of DB PC’s, so systems PC’s are proportionally smaller. E.g., SOSP 2013 had 160 submissions and 28 PC members. If each paper had three reviews, that’s 17 reviews per PC member. With five reviews/paper, that’s 29 reviews per PC member. If DB conferences cranked up the reviewing load, then we’d probably participate in fewer PC’s, so the workload on each of us might be unchanged. A PC member would see more of the submissions in each area of expertise, which might help reduce the randomness of decisions … maybe. Overall, I don’t see a compelling argument to change, but it’s debatable.

5. Like some DB conferences, many systems conferences have adopted double-blind reviewing. I am not a fan. For papers describing a system project, I find it hard to review and, in some cases, impossible to write a paper for a double-blind conference. I have chosen not to submit some papers to double-blind conferences. I know I am not alone in this.

In summary, the systems and DB areas have become much closer in recent years. Each community has something to learn from the other area’s technical contributions and point-of-view and from its approach to conferences and publications. We really would benefit from talking to each other a lot more than we do.

Acknowledgments: My thanks to Gustavo Alonso, Surajit Chaudhuri, Sudipto Das, Sameh Elnikety, Mike Franklin, and Sergey Melnik for contributing some of the ideas in this blog post.

Blogger’s Profile:
Philip A. Bernstein is a Distinguished Scientist at Microsoft Research. Over the past 35 years, he has been a product architect at Microsoft and Digital Equipment Corp., a professor at Harvard University and Wang Institute of Graduate Studies, and a VP Software at Sequoia Systems. During that time, he has published papers and two books on the theory and implementation of database systems, especially on transaction processing and data integration, which are still the major focus of his research. He is an ACM Fellow, a winner of the ACM SIGMOD Innovations Award, a member of the Washington State Academy of Sciences and a member of the National Academy of Engineering. He received a B.S. degree from Cornell and M.Sc. and Ph.D. from University of Toronto.

Copyright @ 2014, Philip A. Bernstein, All rights reserved.

Facebooktwitterlinkedin
686 views

4 Comments

  • Cris Pedregal Martin on February 25, 2014

    Thanks for the most valuable observations and recommendations about research and generally advanced work on infrastructure-grade software in many many years – not surprising coming from Phil Bernstein, and nonetheless refreshing and welcome (and reminding us of the best of Jim Gray’s). Spread the word, please!

  • Jens Teubner on February 25, 2014

    Phil,

    thanks for this excellent blog post. There’s definitely a lot that the systems and database communities can learn from each other.

    You are concerned about the potentially high load for reviewers if papers get 5-6 reviews each. As far as I know, systems conferences review their papers in multiple rounds. In an initial round, only one or two reviews are done per paper. Based on these early reviews, about half of the papers are rejected early in the process; only the remaining papers get these many reviews that you talk about.

    Such a procedure makes sense to me. Papers without a chance get their notification very early; others, in particular borderline papers, get a lot of feedback; and the entire reviewing effort becomes more focussed on a subset of all papers (and in my opinion that’s that subset where the effort pays off best).

    Overall, I agree that reviewing load is typically higher for a PC member at a systems conference. But it might also be less frustrating, because it’s more focussed on the more interesting papers.

    Jens

  • S. Alspaugh on March 12, 2014

    Some researchers have good reasons to prefer double-blind submissions (see: http://www.pnas.org/content/early/2012/09/14/1211286109 among others) and would appreciate your support in this matter.

    Sincerely,
    S.

  • Margo Seltzer on March 12, 2014

    Phil,

    Welcome to my world! Having lived in both communities for many years, I wanted to comment a couple of points

    2. During the reviewing process of a systems conference, if a paper seems borderline, then the PC chairs solicit more reviews. These reviews do offer a different perspective on the interest-value of the work. But they are usually not as detailed as the initial ones and may be by less-expert reviewers.

    This is not my experience at all — neither as a chair, reviewer, nor author. As mentioned in an earlier comment, reviews are often done in waves with better papers getting more reviews. When the PC needs expertise not present on the PC, we often seek input from the most knowledgable person in the field.

    3. Systems conferences publish very few industry papers and rarely have an industry track. In my experience, an industry paper is judged just like a research paper, but with a somewhat lower quality bar. This is unfortunate. Researchers and practitioners benefit from reading about the functionality and internals of state-of-the-art products, even if they are only modestly innovative, especially if they are widely used. The systems community would serve their audience better by publishing more such papers.

    There is an interesting history here. First, the File and Storage Technologies Conference (FAST) (which I bet you would also find interesting) often has more industry papers than other systems’ conferences. There are almost always two chairs, one from academia and one from industry. That said, I would agree that many of the papers from industry come from the more researchy side of the house, not the product side of the house.

    “In the old days” the USENIX conference attracted a combination of researchers, practitioners, and researcher/practitioners (because they were often one and the same). As those conferences grew in “academic prestige” they started accepting fewer of the papers you describe. And I agree with you that there is a loss from that. In my capacity as USENIX board president, we’ve talked about this issue a great deal and have some ideas, but it’s remarkably difficult to get papers out of industry with the kind of detail that the systems community wants and equally difficult to convince academics to accept papers with lesser detail, because it “weakens the brand.” We do have some ideas int his space; I hope you’ll see some interesting things happening at our conferences soon!

Comments are closed

Categories