May 24, 2021
Our professional societies (e.g., ACM, IEEE, VLDB Endowment) have worked hard to ensure that peer review processes in conferences and journals are fair and continue to fulfill their intended purposes. Our community has introduced refinements such as double blind reviews, automated matching of reviewers to topics and to individual papers, multiple submission deadlines, opportunities for revision and rebuttal, “review quality week” to manually ensure high quality of reviews, and a variety of guidelines for reviewer selection, manuscript handling, and program committee assignments. These refinements have greatly helped to improve the quality of refereed publications.
Conflicts of Interest (COI) during review process. Despite all the valuable refinements to review processes, the operational definition of conflicts of interest (COI) and the means of declaring and handling them has changed little over time. A conflict of interest (COI) is a set of circumstances that creates a risk that professional judgement or actions regarding a primary interest will be unduly influenced by a secondary interest [1]. Primary interest refers to the principal goals of the profession or activity, such as the unbiased review of a paper. Secondary interest includes personal benefits such as financial gain, professional advancement, favours for family, friends, or collaborators. In particular, the secondary interests become objectionable when they are believed to have greater weight than the primary interest.
Why should we even care about COI in review process? COI is prevalent in many professions. Specifically, our professional societies have clear instructions and policy on avoiding COI due to the influence of secondary interests on the primary one. For example, ACM clearly states [2] “Conflicts of Interest in publication are to be avoided because they raise questions about the quality, impartiality, and accuracy of published items, even when the parties involved believe they have been fair and impartial. Sometimes, a conflict will lead to an unconscious bias that can influence decisions, even if the parties involved believed they are objective.” This is, in fact, consistent with extensive studies undertaken in psychology [3]. “Knowingly hiding or falsifying a stated COI” is a violation of the ACM Code of Ethics.
Clearly, avoidance of COI during review process has downstream impact on the quality and accuracy of science that comes out from any venue. Furthermore, it has positive impact on review quality. Consequently, our major conference venues have defined various policies to implement the notion of COI during review process. Typically, COI exists if two people wrote a paper together in the past x years, are at the same institution currently or in the near past or near future, are close relatives or friends, are advisor and (former) advisee, or worked together closely on a project in the past y years, among others.
Manual checks for COI. Our data management venues require authors to self-report COI with all potential members of the program committee (PC). Although such self-reporting by an author is not very daunting in our DB venues compared to AI venues as the size of PC list to scan is typically less than 200, PC chairs cannot realistically double-check such manual declarations of COI including checking whether all conflicts were reported due to the volume of submissions. Indeed, recently it was reported [4] that an automated post-facto check on ICDE 2021 review data revealed a non-negligible number of unreported COI that bypassed the manual declaration and checking system of the conference. Hence, it is paramount to automate the detection and management of unreported COI in our venues. Recently, Winslett and Snodgrass have written an excellent article [6] on the need for automating COI management that I encourage all to peruse.
“Easy things are hard”: Automated COI detection and management. Automating certain types of COI detection (e.g., co-authorship, co-worker) look deceptively simple from 30,000 feet. At first glance, one can simply check the existence of an author’s name in the co-author list of a reviewer from any bibliographic data source (e.g., DBLP) or check if an author-specified domain name is identical to that of the reviewer (i.e., co-worker). While this may detect some COI, such strategy may often miss out many valid COI cases due to the challenges brought by the characteristics of real data in a review management system (RMS) or a bibliographic source. In particular, an author in a RMS (e.g., Microsoft’s CMT, EasyChair) may not necessarily follow the same naming convention as his/her bibliographic source entry. In other words, any COI detection tool needs to deal with the challenges posed by free-form specification of authors’ names and affiliations. This is further exacerbated by homonymous names. To mitigate this problem, some venues demand explicit bibliographic identifiers (e.g., DBLP identifiers) from authors or reviewers. However, such data collection opens up the real possibility of the input data to be invalid or dirty. Indeed, a PC Chair recently observed that “a number of users entered invalid or incorrect links to DBLP and Google Scholar”. Validating such input by an RMS can be hard due to the namesake problem among others. Furthermore, data in bibliographic sources and RMS may itself be dirty (e.g., DBLP may assign a publication to the “wrong” namesake, impacting the set of co-authors of an author).
I have developed a system called CLOSET [5] that automatically detects unreported COI in a conference submission list. It is built on top of a RDBMS and leverages multiple data sources to address the aforementioned challenges. Given a set of reviewers assigned to each paper by a RMS hosting a specific venue, CLOSET identifies unreported COI (if any) for each paper and generates explanations to justify why a specific author-reviewer pair is marked as COI. Currently, it focuses on COI due to same institutions and co-authorship. It generates reports on “conflict” papers and reviewers, possible COI violations based on a venue’s COI policy, as well as graph-view of the co-authorship relationships between authors, reviewers, and meta-reviewers for each conflict paper in order to make the results accessible to PC chairs for decision-making. CLOSET is orthogonal to the “secret sauce” an RMS uses for automatic reviewer assignment to papers. This makes it generic and extensible to any RMS. It also has a suite of additional features such as bid analytics to find “interesting” patterns, publication profile of PC members, and analysis of PC diversity.
Since its inception in 2019, CLOSET has been deployed in 9 conference venues to date (including SIGMOD, VLDB/PVLDB, ICDE) to aid PC chairs in detecting unreported COI. For all venues, it has detected multiple unreported COI that has escaped manual detection or the opaque COI detection system supported by a RMS.
Lastly, our traditional journal review systems are lagging way behind conferences. Not only they do not provide any framework for COI management, majority of them do not even provide any framework for discussions between reviewers and associate editors (AEs). Hence, I have extended CLOSET to J-CLOSET to address the issue of COI management outside the journal review system. Currently, as an AE of TKDE, I use it during reviewer assignment. Recently, I have requested AEs of SIGMOD Record to tap into this framework.
Looking ahead. As a community, I believe we are at the early stages of developing comprehensive policies and techniques to manage COI. There are some interesting challenges that need to be addressed. First, our COI policies need to be revisited. Our current policies only capture a subset of the notion of COI. For instance, co-authorship-based COI policy typically limits COI violation to last x years (e.g., x=2). However, this is an ad hoc value with no scientific basis. Intuitively, COI between two people does not simply disappear if they have co-authored earlier than x years. One may have written multiple papers with a reviewer three or more years ago. A COI policy with x=2 fails to capture this. Furthermore, the notion of COI is associated with human bias and hence should be orthogonal to specific venues. In other words, a COI policy should be same for all venues. However, currently different venues have different COI policies. Second, bias may still exist between two people even if they do not have past co-authorship/co-worker/family relationship due to the characteristics and dynamics of the relationship between them. I refer to this as submarine COI. How do we automatically detect them? Third, bias can go both ways – favourable and unfavourable – it is possible for a person to have antagonistic relationship with a topic or another person that may result in biased reviews on papers written by the latter or papers on that topic. I refer to it as torpedo COI (inspired by the term “torpedo reviewing” introduced in [7]). It may be easier for a human to detect submarine and torpedo COI based on contextual knowledge and content of a review. However, how does a software detect such COI automatically? Once again, it reminds me of Marvin Minsky’s famous words: “Easy things are hard”.
The data management community has led the improvement of peer review process in many ways such as introducing multiple submission deadlines for conference venues, “review quality week”, among others. It’s time that we lead the way for automated COI management as well.
References
1. D. F. Thompson. Understanding financial conflicts of interest. The New England Journal of Medicine, 1993.
2. https://www.acm.org/publications/policies/conflict-of-interest
3. https://www.psychologytoday.com/us/blog/hot-thought/201701/what-s-wrong-conflicts-interest
4. G. Koutrika, V. Zakhary. Diversity & Inclusion Track. ICDE 2021.
5. CLOSET. https://personal.ntu.edu.sg/assourav/research/DARE/closet.html
6. M. Winslett, R. Snodgrass. We Need to Automate the Declaration of Conflicts of Interest. In the Communication of the ACM (CACM), October 2020.
7. S. Jecmen, et al. Mitigating Manipulation in Peer Review via Randomized Reviewer Assignments. NeurIPS, 2020.
Sourav S Bhowmick is an Associate Professor and founder of the data management group, DANTE, at the Nanyang Technological University, Singapore. He is the inventor of CLOSET, a comprehensive software for detection and management of COI. He co-leads the COI management effort in the Diversity & Inclusion in DB committee. He loves to code, write, and paint.
Copyright @ 2021, Sourav S Bhowmick, All rights reserved.
Comments are closed
Nowadays, every tier in scientific publication is suspect: authors are suspected of plagiarism, multiple submissions of the same work, … reviewers are suspected of favoring the authors they know or disadvantaging those they hate, … publishers are suspected to be just industrials making a lot of money…It is a pity to observe this trend and the required machinery (double blind reviews, automatic plagiarism checking, …) used to face it. Are all these efforts worthwhile? Is scientific publication healthier than what is was a few years ago?
Nice write up with convincing arguments. Thanks for developing CLOSET Sourav; I have no doubt it will be of great helpin running these big events.