Friday, May 3, 2013

Review of Big Data: A Revolution that will Change How We Live, Work and Think by Viktor Mayer-Schonberger and Kenneth Cukier (2013, John Murray).

In 2008 the term ‘big data’ was barely in use.  Five years later and it has become latest ICT-related buzzword, used to refer to the recent surge in the generation of huge quantities of diverse and dynamic data produced by social media, transactions and interactions across the internet, sensor and camera networks, a myriad software-enabled devices, scientific equipment, etc.  Mayer-Schonberger and Cukier’s book aims to provide an initial survey and analysis of the big data phenomena and what they call datafication; the process of transforming all things under the sun into data from which value can be extracted.  They argue that a data revolution is underway, with the nature of data production and analysis undergoing a paradigm shift in three ways.  First, the volume of data being produced is being radically transformed, with a move away from sampling to try and capture entire populations (n=all).  Second, by being exhaustive in scope, it is possible to embrace the messiness of data rather than seeking exactitude (as required, along with randomness, in sampling); as they put it “more trumps better”.  Third, that the types of questions asked changes from why (causation) to what (correlation): “We don’t always need to know the cause of a phenomena; rather, we can let the data speak for itself.”  In other words, the traditional deductive, hypothesis-led mode of analysis is replaced by an inductive approach wherein analytics examine the data for all meaningful patterns, rather than testing for particular relationships.  This third shift, they argue, also means that there is no longer the need for domain-specific expertise.  As such, the era of big data is producing massive, exhaustive, messy datasets that can be mined for insightful information that can be used to identify relationships within the data that can be capitalised upon, such as using the vast quantities of data produced by a supermarket chain about consumers and their transactions to identify patterns of purchases which can then be used to tailor marketing strategies and increase turn-over. 

Mayer-Schonberger and Cukier are right that there is a data revolution underway and they provide an initial overview of the big data phenomena.  However, their analysis is weak in a number of respects.  First, it ignores completely emerging debates about the kind of empiricism and data dredging they describe, which are deeply problematic in all kinds of ways, and the data-driven science being advocated by scientists.  No scientist or analyst worth their salt believes that data simply speak for themselves free of theory.  Second, the account is quite sketchy as to how analysts can make sense of big data and the new analytical techniques that are being developed.  There is a science to big data in terms of devising the algorithms employed within machine learning and other big data analytics, yet the reader gets no real insight into how these work.  Third, the book avoids tackling the deep and difficult epistemological questions that arise when a paradigm shift occurs.  The book is clearly targeted at a non-academic audience, but nevertheless a grounded discussion of the philosophy of data and science in the era of big data is merited when such grandiose claims are being made.  Fourth, they rightly acknowledge that big data raises all kinds of ethical issues, but their analysis lacks depth and critique and pushes a business-friendly, market-orientated line about self-regulation without adequately setting out the pros and cons of such a strategy.  Finally, the text suffers from being overly repetitious and the referencing is dreadful: it would be possible to condense the entire book into just a couple of chapters and not lose any of the argument, and whilst there are notes in back of book there are no numbers in the text to link to them.  Overall then, whilst the book provides an initial text about big data and does include some interesting and useful nuggets, the analysis in general is narrow and weak, and it seems more about championing an emerging ICT market than providing a thorough, critical overview of the nature of big data and its implications and consequences.


Bernadette said...

Great review Rob. I've not read this book but in general I think a fair portion of what is being said about big data is hype, often sprouted by people who are better at selling themselves and jumping on bandwagons in the guise of offering consultancy (at great cost) than anything genuinely useful. I've not heard of the first author but Cuiker wrote on IT/data for The Economist for years and I was often underwhelmed with his analysis (though I won't claim to have read every article he ever wrote).

In the government sector big data is being touted as the saviour we've all been looking for and it's worrying because there seems to be a lack of understanding that data is useless without context and social modelling. But governments are throwing money at many big data research projects which have almost no methodology other than to collect the data.

Not to say there's not a legitimate use for big data but the lack of scientific and analytical rigour around a lot of it is a little disturbing.

Rob Kitchin said...

Bernadette, there is some emerging critical thinking in the academic literature and occasionally on blogs, etc., about big data. From the science perspective, a number of people have challenged the kind of empiricism forwarded in this book, which is rooted in Chris Anderson's 2008 article in Wired about the end of theory. At the time there was a backlash, but the argument has now become more refined and is being honed through the notion of data-driven science. From the social sciences, the critique has mostly focused on the ethical implications of big data. For me, this book is an example of the boosterism surrounding big data at present and lacks a critical edge. It could easily be mistaken for a text written by someone employed by IBM, rather than a book written by an academic and journalist.