Senjuti Basu Roy

Returning Top-K : Preference Aggregation or Sortition, or is there a Better Middle Ground?


Given a large number of users’ preferences (numerical or ordinal scores, ranked order) over a large number of objects,  returning top-k results entails selecting a small list/set containing exactly k objects that are most “appropriate “. In this article,  I will investigate two alternatives for selecting a top-k list/set that consumes such preference based inputs. First, the so-called preference aggregation models (selection by election) that combine individual preferences and aggregate them to select the final top-k.  I will revisit a rich body of existing quantitative models adopted from the social choice theory that are designed to represent democratic ideals in the selected top-k. Then, I will present the Sortition process,  or election by lot, a method practiced by Ancient Athens for choosing public officials in some ancient Greek city-states, which selects top-k objects as a random sample from a larger pool of candidates. Obviously, compared to the existing preference aggregation models, the qualitative properties of the Sortition process is Iess investigated mathematically – albeit there exists anecdotal evidence in favor of it. I intend to compare these two alternatives from several dimensions as well as investigate their computational implications. I will conclude the article by laying out a middle ground which perhaps is the best of both worlds.

The problem

Imagine the world of inputs consisting of a large number of objects having complete or partial preference information of different kinds from even a larger pool of users.  A simple user preference could be elicited as follows:

John: Choc > Vanilla > Strawberry…….

  Amy: Strawberry > Vanilla > Choc……..


It is not a stretch to extend this example over several thousands or even a million of objects (products, songs, tweets), where the preference is  given by millions of users. This  brings in the challenge of scale to the problem. Other representations of preference could be elicited via score (rating on the products), or in an ordinal scale (excellent, good, average, etc). We intend to study all the alternatives in the context of aggregation types. 

Use cases

This operation is a building block in many database and information retrieval applications,  and is widely used in  web search for aggregating results from different search engines, recommender systems, electoral processes, or in  general allocating scarce resources among many candidates, such as, in hiring or admission. 


The end goal is to retrieve a small number of k objects from the given set of objects that are representative of the users’ preferences. The output could be  a list or permutation having a strict order, or a set, where any two objects in the returned result set are equivalent.


Preference aggregation is a form of selection by election, and is the most widely used mechanism practiced in the computing community to aggregate individual preferences to decide the final top-k. Depending on the underlying preference models (score based vs. ordinal vs.  rank based), there exists a large body of existing quantitative models  primarily adopted from social choice theory that claim to satisfy population preferences in the selected top-k. At the other extreme, there exists Sortition, which selects k random samples from a larger pool of candidates as results. The data centric computing community, such as the database and information retrieval researchers are mostly aware of some of the most popular models and their computational implications (such as, Kemeny optimization is NP-hard for 4 or more lists) [12] or perhaps on their social welfare propositions (such as, average relevance is utilitarian, whereas, least misery is egalitarian). 

I intend to write this blog with three primary goals. (A) Investigate a wide range of preference aggregation models both computationally and in the realm of democratic ideals and highlight the point that there does not exist a model that is even close to be called ideal.  Moreover,  most of these models are computationally demanding. (B) Highlight the strengths  as well as weaknesses of Sortition, which is incredibly computationally lightweight, by demonstrating some important use cases. (C) Lay out a plan that takes the best of both worlds.

Preference Aggregation

Preference aggregation is the process of selection by election and is grounded on the principle of voting theory, which intends to represent democratic ideals (some of the key properties are described in Table 1) in the selected top-k results. From now onwards, users and voters would be used interchangeably.


The paradox of voting was discovered over 200 years ago by M. Condorcet, a French mathematician, philosopher, economist, and social scientist. However, the importance of the voting paradox was not fully realized until several years after Kenneth Arrow published Social Choice and Individual Values in 1951, which contained his celebrated General Possibility Theorem. The essence of this theorem is that there is no method of aggregating individual preferences over 3 or more objects that satisfies several conditions of democratic ideals and produce logical results.

Table 1: Key Properties of Preference Aggregation

These properties have different implications:

(a) Diversity/Fairness – The first 10 properties promote the notion of diverse/fair of different kinds that has received attention from both data management and information retrieval community. As an example, the proportionality criterion preserves minority representation, the so-called demographic parity that is well understood in  algorithmic fairness. Properties, such as, Non dictatorship, pareto efficiency, independence of irrelevant alternatives, unrestricted domain, condorcet property, anti plurality, solid coalitions attempt to represent preference of the voters in the selected/elected top-k.

(b) Robustness– Although there does not exist a single universally accepted definition, robustness intuitively denotes the degree to which an underlying process is vulnerable to small changes, formalized using Quasi-chaotic properties by Dummet considering monotonicity, Independence of Irrelevant Alternatives, and Transitivity.

(c) Computational Implications – There are types of computational problems that are relevant, namely: (i) Computing required to produce top-k given multiple voters’ preferences. (ii) Computing required to satisfy additional properties, to satisfy diversity/fairness/both. A well studied problem in this context is computing  the margin of victory (MOV) that demonstrates manipulability of the underlying method, which is to figure out the minimal changes required in the user preference to change the original winner/ top-k. As it turns out, this latter problem ends up being computationally intractable even for simple aggregation models.


According to social choice theory, there exists 4 primary kinds  of preference aggregation models. These models also encapsulate different forms of preference representation of the users, such as, score-based, ordinal, or rank-based. 

1.Majoritarian– Majority rule is the simplest of non-dictatorial preference aggregation. When two alternatives are under consideration, the one preferred by the most voters is selected.

2. Positional Methods– Unlike majoritarian methods, which use only information from binary comparisons between candidates, positional methods take into account information about individuals’ preference orderings. However, these methods do not necessarily take into account a voter’s entire preference ordering. Plurality voting – which uses only information about each voter’s most preferred alternative – approval voting – which uses information about a varying number of top alternatives for each voter- and the Borda count, which uses information about each voter’s entire preference ordering, are part of this type. 

3. Multi-Stage Methods –  Multi-stage procedures use either different choice functions at different stages of the procedure, or one choice function iteratively on diminishing sets of alternatives. Runoff Election Procedures, Single Transferable Votes (STV) are part of this type.

4. Utilitarian Methods – Utilitarian preference aggregation requires voters to assign utility values to each alternative. Utility values can be thought of as measures of expected happiness with each alternative on some finite scale.

I refer to Table-1 that enlists several (not complete though) popular aggregation methods and describes their properties and types1 . These methods are quantitative and a mathematical model exists for each.


Computational Implications

As mentioned above,  there are two types of computational implications that may be interesting to the data-centric computing community; (i) Compute top-k : produce top-k objects  (ranked or set) for a given aggregation method, by consuming a large number of user preferences. (ii) Compute margin: change the original outcome minimally to satisfy additional criteria, (e.g., proportional representation, MOV) for a given aggregation method. The latter one is mostly studied as a constraint optimization problem, where these additional criteria are imposed as constraints.

Table 2: Preference Aggregation Models

1. Compute top-k

Producing top-k based on some of these aforementioned models is an interesting technical problem in its own merit. Several of these models are computable in polynomial time, however, several of these are not, as listed down in Table-2.

Table 3: Computationally Intractable Models for Producing top-k

2. Compute Margin

The goal here is to change the original user preference minimally, i.e., compute margin, given an underlying aggregation method, such that – (a) the produced top-k objects satisfy additional properties, such as, proportional representation; (b) computing MOV which ensures that the produced top-k is different from the original top-k. These latter problems are designed to study robustness of the underlying aggregation method, i.e. how hard it is to manipulate the original top-k. 

Table 4: Computationally Intractable Models for Computing MOV

As it turns out, producing MOV is much harder computationally. As an example, even though producing top-k based using Borda is polynomial time computable, producing MOV for Borda is NP-hard. In fact, changing the original top-k turns out to be NP-hard even for very simple aggregation models, such as, average score, and least misery [4]. 

In voting theory, the concept of margin of victory (MOV) is designed to measure electoral competitiveness of the candidates, that we formalize in recent works [6, 7] considering kemeny optimization and plurality voting as the underlying aggregation methods. We study margin as the smallest number of single ballot substitutions under plurality voting to promote a given set of k candidates as the top-k and present a set of computational results. It is needless to say that for the models that are computationally intractable (such as, CC, STV, STV-Borda), computing MOV involving those models are also NP-hard.

The key takeaway is the following: even though there exists a large number of preference aggregation methods,  Arrow’s theorem points out their ineffectiveness in more than one account . Second, these models are computationally intensive to calculate. FInally, even when certain properties are claimed to be theoretically present in a model, the underlying data itself may be unrepresentative – because those who have the time, money, and influence are over represented in the data, and the selected top-k tends to represent those preferences.


The other alternative is an incredibly computationally lightweight mechanism, namely Sortition, which is the process of selecting the top-k via lottery. As unrealistic as it may sound, the ancient Athenians chose only ten percent of their officials by election, selecting the rest by sortition—a lottery, which randomly selected citizens to serve as legislators, jurors, magistrates and administrators. The Kleroterion (Figure-1) was a device used to choose the names of those who would serve. It was a slab of stone incised with rows of slots and with an attached tube. Citizens’ tokens were  placed randomly in the slots so that every member of each of the tribes of Athens had their tokens placed in the same column. There was a pipe attached to the stone which could then be fed dice that were colored differently (assumed to be black and white) and could be released individually. When a die was released, a complete row of tokens (so, one citizen from each of the tribes of Athens) was either selected if the die was colored one color, or discarded if the color was different. This process continued until the requisite number of citizens was selected. Clearly, the aforementioned process simulates uniform random sampling to select top-k.

 Sortition  has become largely antiquated in the modern world, except for selecting jurors in the US randomly from their tax rolls. Computing communities have resorted to Sorition processes to promote individual fairness [13] to ensure objects of similar “worth” (utility) have a similar probability of being returned to the end user.

Figure 1: Device Kleroterion, used by Ancient Athenians for Sortition

The key question however is,  sortition, which is  clearly computationally lightweight, is it representative of the underlying users’ preferences?

As an immediate case study,  I would like to resort to the following statistics, published in a recent work [18]: “ If the US Senate were to change overnight to a sortition house, then the number of males would go from 79% down to 49%, while the number of females would go up from 21% to 51%; the number of whites would go down from 90% to 77%, and the number of Blacks and Hispanics. would go up from 3% and 4% to 13% and 18%, respectively. Sortition members would be significantly younger (the average Senator is 62 years old), and less educated (76% of senators have more than a bachelor’s degree)”…….

These statistics also allude to the weaknesses of preference aggregation methods.   I tend to agree with this school of thought and argue that preference aggregation  models,  aka selection by election, actually lead to unrepresentativeness, because those who have the time, money, and influence are over-represented in the data, and the selected top-k tends to represent those preferences. The computing community too seems to care about representativeness to ensure the produced results are diverse, or fairly (proportionally) represent the underlying population.

Sortition might be much more descriptively representative, as a random sample or a stratified sample from the population, as long as the sample is large enough. Using  sortition, the statistical probability is likely to guarantee the selection of top-k who truly represent the underlying population.

Sortition, however, has its obvious weaknesses, too. (a) When k is small (i.e., the selected number of objects is not large enough), the top-k selected by lottery may not be representative. This may call for applying advanced sampling techniques and studying their qualities analytically. (b) Sortition, in its original form, is not suitable when users’ preferences are provided as ranks or score. Therefore, interesting research challenges lie ahead in studying these variants in a systematic manner. (c) finally, sortition as it is, is not suitable when the produced top-k needs to be a list/permutation.

Sortition, followed by Preference Aggregation (The way forward)

Perhaps a middle ground exists that takes the best of both worlds.

I conclude this article by laying out a middle ground by motivating the opportunities and challenges that exist in combining sortition and preference aggregation and leaving some food for thought.

A primary weakness of the preference aggregation methods is that they are oftentimes computationally intensive. On the other hand, Sortition is highly lightweight, but does not extend easily for ranked inputs or when the output needs to be ordered. A research opportunity exists in studying the middle ground, which is less computationally intensive than running the aggregation models on the entire inputs, i.e.,  perform sortition first and collect a reasonably large number of samples from there, and then perform aggregation over the collected samples, as opposed to on the entire inputs. A cautionary note here though – Sample Carefully. It is evident when k is small, a truly uniform random sample is not representative of the population. Thus an alternative is to select the top-k carefully, stratify when possible, and cluster/multistage the sampling process, as applicable for the underlying input data. This gives rise to a set of interesting research problems.

(1) A key question is to define the notion of sample carefully and meticulously. A sample can represent a subset of the users’ entire preference, partial preference of all users, or a combination of both.  The challenge however is to ensure that for a given aggregation model, if a property holds true on the entire inputs, it must also remain satisfied on the collected samples. As a specific example, given a large number of ranked preferences, how to obtain a sample such that the Borda score on that satisfies consistency, pareto-optimality, and monotonicity?

(2) Equally important is the aspect of understanding and analyzing the sample size which will allow one to theoretically approximate the error introduced by the sampling process wrt the entire inputs. As an example,  how to sample from the given inputs, such that the selected top-k are not very different from the original top-k, or how to reason the value of MOV obtained on the sample with the value of MOV obtained on the entire input.  

(3) More generally, given a sampling approach and size, an interesting question is to be able to produce theoretical error bounds of the underlying computational problems.

(4) An interesting research question for the social welfare theorists is to study what additional properties does this combination give rise to that either of them does not satisfy individually.

Blogger’s Bio

Senjuti Basu Roy is the Panasonic Chair in Sustainability and an Associate Professor in the Department of Computer Science at the New Jersey Institute of Technology. Her research focus lies at the intersection of data management and AI, especially enabling human-in-the-loop AI  in scale. Senjuti has published more than 75 research papers in high impact data management and data mining conferences and journals. She is the tutorial co-chair of VLDB 2023, The Web Conference 2022, has served as the Mentorship co-chair of SIGMOD 2018, PhD workshop co-chair of VLDB 2018, and has been involved in organizing several international workshops and meetings. She is a recipient of the NSF CAREER Award, a PECASE nominee, and one of the 100 invited early career engineers to attend the National Academy of Engineering’s 2021 US Frontiers of Engineering Symposium.


1. Kenneth Arrow. Social Choice and Individual Values. Wiley, New York, 1951. 

2. Amer-Yahia, S., Roy, S. B., Chawlat, A., Das, G., & Yu, C. (2009). Group recommendation: Semantics and efficiency. Proceedings of the VLDB Endowment, 2(1), 754-765.

3. Esfandiari, M., Basu Roy, S., & Amer-Yahia, S. (2018, October). Explicit Preference Elicitation for Task Completion Time. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management (pp. 1233-1242).

4. Roy, S. B., Thirumuruganathan, S., Amer-Yahia, S., Das, G., & Yu, C. (2014, March). Exploiting group recommendation functions for flexible preferences. In 2014 IEEE 30th international conference on data engineering (pp. 412-423). IEEE.

5. Basu Roy, S., Lakshmanan, L. V., & Liu, R. (2015, May). From group recommendations to group formation. In Proceedings of the 2015 ACM SIGMOD international conference on management of data (pp. 1603-1616)

6. Wei, D., Islam, M. M., Schieber, B., & Basu Roy, S. (2022, June). Rank Aggregation with Proportionate Fairness. In Proceedings of the 2022 International Conference on Management of Data (pp. 262-275).

7. Islam, M. M., Wei, D., Schieber, B., & Basu Roy, S. (2022, June). Satisfying Complex Top-k Fairness Constraints by Preference Substitutions. To Appear in VLDB 2023.

8. Airiau, S., Aziz, H., Caragiannis, I., Kruger, J., Lang, J., & Peters, D. (2019, August). Portioning using ordinal preferences: Fairness and efficiency. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI 2019.

9. Aziz, H., Faliszewski, P., Grofman, B., Slinko, A., & Talmon, N. (2018, January). Egalitarian Committee Scoring Rules. In IJCAI (pp. 56-62).

10. Munagala, K., Shen, Z., & Wang, K. (2021, July). Optimal algorithms for multiwinner elections and the chamberlin-courant rule. In Proceedings of the 22nd ACM Conference on Economics and Computation (pp. 697-717).

11. Blom, M., Conway, A., Stuckey, P. J., & Teague, V. J. (2020, April). Did that lost ballot box cost me a seat? Computing manipulations of STV elections. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 34, No. 08, pp. 13235-13240).

12. Fagin, R., Kumar, R., & Sivakumar, D. (2003). Comparing top k lists. SIAM Journal on discrete mathematics, 17(1), 134-160.

13. Flanigan, B., Gölz, P., Gupta, A., Hennig, B., & Procaccia, A. D. (2021). Fair algorithms for selecting citizens’ assemblies. Nature, 596(7873), 548-552.

14. Diaconis, P. (1988). Group representations in probability and statistics. Lecture notes-monograph series, 11, i-192.

15. Cross, E. (2005). NP-Hard Manipulations of Voting Schemes.

16. Malleson, T. (2018). Should democracy work through elections or sortition?. Politics & Society, 46(3), 401-417.

17. Nikookar, S., Esfandiari, M., Borromeo, R. M., Sakharkar, P., Amer-Yahia, S., & Basu Roy, S. (2022). Diversifying recommendations on sequences of sets. The VLDB Journal, 1-22.