Felix Naumann

Reflections of a Data Skeptic

Big Data
In the 21st century, our everyday lives are shaped by what machines tell us to do. Big data and their application to economy, policy, and our social life act as behavioral equalizers. And this they do under the assumption of optimization in both subtle and open ways. In fact, they are pushing individuals into uniform rhythms of living. We are caught in a slipstream of data.

This post is not intended as a lament on how governments and commercial organizations are invading our privacy by collecting personal data, nor a lament on how modern analytical methods are used to classify us users. In fact, it is no lament at all, but rather a long-term, non-scientific prospect of where a data-driven society might be headed. I believe we are moving from the quantified self to the optimized self and finally to the optimized populace.

Our daily lives are affected in many ways. GPS navigation directs traffic towards larger streets; fitness apps tells us when and how to exercise; cooking regimens tell us what to cook; learning programs tell our children how to study; healthcare providers tell us when to visit which doctor; security checkpoints tell personnel whom to frisk; social networks tell us whom to date and what to like; online stores tell us what to read; streaming services what to watch; search engines how to search, and so on. Of course, all these examples confront us in the form of suggestions and recommendations – only seldom are we actually forced to comply. But often enough we are unable to judge the outcome and simply follow the suggestions. Such nudges usually happen in the best of our interest; it is hard to argue against them (it is for our own good, after all, or for that of the environment). It is difficult to ignore an incessant “Please turn around when possible” in a car; it is even more difficult to ignore suggestions on improving personal fitness; it is close to impossible to ignore recommendations for health check-ups. Thaler and Sunstein have already shown the persuasive power of such nudges [7]. In a recent article Patricia Marx cites a tester of an electronic sleeping helper: “Honey, I did it! I got a sleep score of ninety-eight!” [3] What is important in this big data environment is the breakdown of conversation and argument. You cannot talk back, you can only turn off the machine. It may even listen, but you cannot persuade it to change its mind. And even if it would, it would always be following its internal mission: Optimization. But in the name of what and whom?

Also at a larger scale, big data is used to plan cities and infrastructure, to design products and fashion, to organize agriculture, etc. “If you could see everybody in the world all the time, where they were, what they were doing, who they spent time with, then you could create an entirely different world. You could engineer transportation, energy, and health systems that would be dramatically better,” says Alex Pentland [5]. Yet, if everybody knew everything and had access to same analytical methods or results, everybody would come to the same conclusions and act thereupon in the same way. There is no doubt that there are very many positive uses of big data, and that society and the world can become a better place. But is this place still as attractive, as fun, and as individual as it could be? Or: What does “better” mean?

It is high time (and great fun) to dust off some of the old classics and re-read where our world might be heading. As many have observed, Big Brother’s technology of George Orwell’s dystopian novel “Nineteen Eighty-Four” is already feasible, prevalent, and here to stay [4]. Cameras are watching and identifying us from high poles, from ceilings, in public transportation, from our laptops and smartphones. Algorithms are classifying us as potential buyers, lovers, or offenders. For some countries, Orwell’s vision is not limited to the technology, but big brother himself has become manifest.

Aldous Huxley, on the other hand, describes a world that is much cleaner, of much higher technical prowess, and much more optimized [2]. Technology is used to streamline almost every aspect of its inhabitants. People of all castes are highly satisfied, they enjoy perfect health, all peers are treated equally (within each caste), and happiness prevails. Only an outside observer, and of course the protagonist Bernard Marx, recognize that, in effect, they live in a totalitarian state. The world is highly efficient, but personal freedom is minimized. Thirty years later, Huxley analyzes that indeed our world is moving towards his vision of a brave new world, and much faster than he had imagined.

Over a century ago E.M. Forster described a world similar to that of Huxley [1]. People uniformly exist (and almost nothing more) under the dream-like control of “the machine”. Again, the machine has optimized the lives of humans, reducing them to mere sources of thoughts and “ideas”. Yet, Forster goes one step beyond this vision and describes the initially slow but rapidly increasing decay of this pervasive machine until “The Machine Stops” – the title of his story. Pandemonium ensues, and the author leaves the reader with only a flickering hope for the survival of humankind.

Obviously, many of the aspects in these books are purely fictional and others still many decades or centuries away. But I do believe that many or even most projections of a machine- (and data-) driven society, and the resulting uniformity of individuals are both realistic and likely. They need not happen by the explicit intentions of a state or its leaders, but out of a general compulsion for optimization, security, and self-preservation. Individually, each data-driven optimization is reasonable, makes life better and is unarguably good. Applied to very many individuals increases monotony, reliance on machines to guide us, and ultimately homogeneity. Applying very many of such automated improvements to very many individuals creates a slipstream that pulls people into uniform ways of easy but boring living. Why boring? Mostly because big data applications are used to prevent surprises. And most people do not want surprises, certainly not the leaders of totalitarian societies. Nor, for that matter, commercial entities, which aim to sell products to consumers whose commercial appetites have been pre-shaped by advertisement and marketing-prognostications.

This post is certainly not the first text to suggest taking a step back from the big data hype and try to gain a broader view. But usually, this new perspective has been somewhat self-centered: Is big data truly advancing database and systems research? Is big data and its technology truly adding value to my company? Data for Humanity, for one, attempts an interesting and important viewpoint – how can (big) data help society or at least not harm it [8]. The authors encourage people and institutions to perform their work responsibly and proactively search for positive uses of data. Stoyanovich et al. also argue for a responsible use of data as they recognize that big data “technology can propel economic inequality, destabilize global markets and affirm systemic bias.” [6] Their suggestions for responsible data analysis, fairness, non-discrimination and transparency are without doubt imperative, but hardly affect the possible uniformity of data-driven lives.

In summary, big data optimization already touches very many aspects of our lives and promises / threatens to affect more. We could be heading towards an optimized but uniform society with no place for individual choices, bad habits and the ensuing serendipity. And big data may help us to understand the consequences of social injustice, but they are not able to tell us what justice really is.

Pointers

[1] E.M. Forster: The Machine Stops, The Oxford and Cambridge Review, November 1909 http://archive.ncsa.illinois.edu/prajlich/forster.html

[2] Aldous Huxley: Brave New World, 1932

[3] Patricia Marx: In Search of Forty Winks, New Yorker, February 8 & 15, 2016 http://www.newyorker.com/magazine/2016/02/08/in-search-of-forty-winks

[4] George Orwell: Nineteen Eighty-Four, Secker and Warburg, London, 1949

[5] Alex Pentland: Reinventing Society in the Wake if Big Data. Edge.org, 2012
https://www.edge.org/conversation/alex_sandy_pentland-reinventing-society-in-the-wake-of-big-data

[6] Julia Stoyanovich, Serge Abiteboul, Gerome Miklau: Data Responsibly: Fairness, Neutrality and Transparency in Data Analysis. EDBT 2016: 718-719

[7] Richard H. Thaler and Cass R. Sunstein: Nudge: Improving Decisions about Health, Wealth and Happiness. Yale University Press, 2008

[8] Roberto Zicari and Andrej Zwitter: Data for Humantiy: An Open Letter http://www.bigdata.uni-frankfurt.de/dataforhumanity/

Blogger Profile

Felix Naumann studied mathematics, economy, and computer sciences at the University of Technology in Berlin. After receiving his diploma (MA) in 1997 he joined the graduate school “Distributed Information Systems” at Humboldt University of Berlin. He completed his PhD thesis on “Quality-driven Query Answering” in 2000. In 2001 and 2002 he worked at the IBM Almaden Research Center on topics around data integration. From 2003 – 2006 he was assistant professor for information integration at the Humboldt-University of Berlin. Since then he holds the chair for information systems at the Hasso Plattner Institute at the University of Potsdam in Germany. His research interests include data quality, data cleansing, data profiling, data integration, and text mining.

Copyright @ 2016, Felix Naumann, All rights reserved.

Facebooktwitterlinkedin
389 views

4 Comments

  • Thomas Marx on April 11, 2016

    Is it really the fault of data that we all use similar apps and have similar behavioral patterns? I am shocked!

  • Antonio Badia on April 11, 2016

    “Yet, if everybody knew everything and had access to same analytical methods or results, everybody would come to the same conclusions and act thereupon in the same way.” I think history shows this not to be the case. This assumes the ‘rational actor’ of economics which, as has been shown over and over, does not correspond to reality. As the author knows, optimization relies on an objective function that guides it. We all have our own objective functions that do not have to coincide. I think we should be talking more about what goes into the objective function -and who and why designs it. There’ll always be some people who’ll try to take advantage. The other aspect that we should be discussing is the data that we feed to the algorithm. Others have more eloquently argued that there is no such thing as ‘raw data’; see recent discussions on ‘algorithmic fairness’.

  • Antonio Badia on April 11, 2016

    I think there are deeper reasons to be skeptical of (big) data:
    -“Yet, if everybody knew everything and had access to same analytical methods or results, everybody would come to the same conclusions and act thereupon in the same way. ” History shows this not to be the case. This seems to rely on the ‘rational agent’ assumption in economics, which has been thoroughly debunked (see behavioral economics). Optimization relies on an objective function; we all have our own objective function, which may not be shared.
    -the data that we feed our algorithms is rarely ‘neutral’ (if such a thing exists), certainly never a random sample (see “algorithmic fairness”). Optimizing over this is simply going to reflect (and augment) any existing biases.
    Those are the issues that should make us skeptical of any techno-utopias.

  • Antonio Badia on April 11, 2016

    I think there are deeper reasons to be skeptical of (big) data:
    -“Yet, if everybody knew everything and had access to same analytical methods or results, everybody would come to the same conclusions and act thereupon in the same way. ” History shows this not to be the case. This seems to rely on the ‘rational agent’ assumption in economics, which has been thoroughly debunked (see behavioral economics). Optimization relies on an objective function; we all have our own objective function, which may not be shared.
    -the data that we feed our algorithms is rarely ‘neutral’ (if such a thing exists), certainly never a random sample (see “algorithmic fairness”). Optimizing over this is simply going to reflect (and augment) any existing biases.

Comments are closed

Categories