“Big data” is more than one thing, but an important aspect is its use as a rhetorical device, something that can be used to deceive or mislead or overhype. It is thus vitally important that people who deploy big data models consider not just technical issues but the ethical issues as well. – Cathy O’Neil, Columbia University
What is ‘Big Data’?
In the relatively straightforward terms of Dr Asad Khan, Senior Lecturer in Information Technology at Monash University, ‘Big Data’ “relates to very large sets of data collected through free or commercial services on the internet.” For example, the stats in the image below from The Internet in Real Time indicate some of the free or commercial online services through which we contribute to these large data sets and also how much data is generated.
That’s a lot of data, but presumably just the tip of the iceberg. According to Prakash Nanduri, “everything we know spits out data today — not just the devices we use for computing. We now get digital exhaust from our garage door openers to our coffee pots, and everything in between.” Many of our students, for example, are likely generating data just by sitting in class if they have mobile devices that are turned on, sending location data to Apple via the map application or Facebook if they have allowed the Facebook app to access the phone’s GPS. They will generate more data again if they are doing something through an LMS such as Blackboard or having their attendance recorded by a teacher or responding with their phones to a Socrative activity. If they have any wearable technology on them such as a FitBit or Apple Watch, they will be generating even more data again.
Perspectives on ‘Big Data’
‘Big Data’ clearly means different things to different people, as this startling array of perspectives collected by researchers at the UC Berkeley School of Information indicates. There’s an awful lot of overblown rhetoric and jargon in there and a lot of the definitions are not helpful at all in understanding the term. For example, this one from Jon Bruner:
Big Data is the result of collecting information at its most granular level — it’s what you get when you instrument a system and keep all of the data that your instrumentation is able to gather.
What on earth does it mean to ‘instrument a system’? I didn’t even know ‘instrument’ could be used as a verb… At the other extreme is this one from David Leonhardt of the New York Times:
Big Data is nothing more than a tool for capturing reality — just as newspaper reporting, photography and long-form journalism are. But it’s an exciting tool, because it holds the potential of capturing reality in some clearer and more accurate ways than we have been able to do in the past.
Leonhardt’s comparison is very misleading in its vagueness (‘capturing reality’?) and simplicity (‘nothing more than a tool’). It also contrasts starkly with grandiose statements like this one from Drew Conway, a self-described “leading expert in the application of computational methods to social and behavioral problems at large-scale”:
Big data, which started as a technological innovation in distributed computing, is now a cultural movement by which we continue to discover how humanity interacts with the world — and each other — at large-scale.
It brings to mind Frank Pasquale’s recent characterisation of Big Data as a ‘human knowledge project’.
Big Data as a ‘human knowledge project’
About 11 minutes into this video, Pasquale compares Big Data to the human genome project in order to “really give people a sense of the gravity of the problem”.
“When the human genome project really got going…we had an immediate response which was to say ‘this is such important, epically important knowledge that we need to ensure that we’ve fully considered the ethical, legal and scientific implications of it’. I believe that the human genome project is mirrored by all this big data surveillance, which I would call a human knowledge project. And I think we need to treat that just as seriously as we treat the genome project…. [Genetic technology] are ways of understanding what makes us tick on a biological level. The Big Data surveillance apparatus is essentially an effort to find out what makes us tick on a social level, to achieve that level of intimate understanding of human society. And I believe that there are really some strong parallels here.
So for example there’s often a really fine line between productive, good uses of the technology and oppressive ones. With genetics, I’m sure almost all of us would applaud efforts to get rid of genetic abnormalities and genetic disease but we also recognise that certain forms of genetic engineering raise real policy questions. I think similarly with the example of Big Data, we might all applaud, say, some basic analytics of how well drivers do but when drivers are monitored every second, second by second, by a camera that’s in their face, that starts becoming too invasive, it starts becoming oppressive.
I also believe that the parallel with genetics is powerful because it’s unknown exactly where it will take us, ok, we don’t know exactly which direction this all-purpose technology is going. Better to frontload the ethical and legal analysis than perpetually be trying to play catch-up…what I’m trying to say is if we as a society to, say, having laws that would prevent us from using genetic information from discriminating against certain employees – not hire them, not promote them – we should similarly be open to and receptive to blocking out certain forms of information when making critical decisions about individuals and I think if we’re not and if we don’t take a much more conscientious stand on the extent of the collection of this data, openness about how it’s analysed and restrictions on how the most sensitive data is used, we are really headed toward a very troubling black box society where essentially algorithms rather than human judgement are making the most critical decisions about our lives.”
Big Data, algorithms and transparency
Another interesting comment from the UC Berkeley research is this one from Daniel Gillick, Senior Research Scientist at Google:
Historically, most decisions — political, military, business, and personal — have been made by brains [that] have unpredictable logic and operate on subjective experiential evidence. “Big data” represents a cultural shift in which more and more decisions are made by algorithms with transparent logic, operating on documented immutable evidence. I think “big” refers more to the pervasive nature of this change than to any particular amount of data.
There’s a couple of points worth noting here. First, the close connection between algorithms and Big Data: Big Data is of no use without algorithms to filter, classify, associate and prioritise it. Secondly, the belief in the ‘transparent logic’ of those algorithms. This seems either misguided or disingenuous, for reasons that Nick Seaver outlines in his 2014 paper, Knowing Algorithms:
At their most simple, calls for transparency assume that someone already knows what we want to know, and they just need to share their knowledge. If we are concerned about Google’s ranking algorithm for its search results, presumably that knowledge exists inside of Google. The moral rhetoric accompanying calls for transparency assumes that Google does not or cannot have critical perspective, so its inside knowledge should be passed along to those who do, or that transparency is a moral, democratic end in itself. While transparency may provide a useful starting point for interventions, it does not solve the problem of knowing algorithms, because not everything we want to know is already known by someone on the inside.
Gillick, in contrast, in his depiction of a ‘cultural shift’ to more transparency in how individuals and institutions operate, presents Big Data as a force driving progress towards, presumably, a society free of discrimination, bias and subjective decision-making. In fact, as more decision-making processes become ‘black-boxed’ – hidden behind increasingly unknowable algorithms – the true shift may actually be in the opposite direction: towards increased discrimination.