For several years now, big data has been one of the hottest topics in the IT industry. But, what do we mean by big data? What are people really excited about? Is it real, hype, or something in between?
A number of recent articles have been sounding the alarm that big data may be at the peak of inflated expectations in Gartner’s hype cycle. For example, Gartner’s Research director Svetlana Sicular recently observed that big data has already reached the peak of the hype cycle, and is now falling into the trough of disillusionment, a necessary step before (hopefully) moving on to the slope of enlightenment. She writes that a number of her most advanced big data clients are starting to get disillusioned:
“These organizations have fascinating ideas, but they are disappointed with the difficulty of figuring out reliable solutions. . . Several days ago, a financial industry client told me that framing a right question to express a game-changing idea is extremely challenging: first, selecting a question from multiple candidates; second, breaking it down to many sub-questions; and, third, answering even one of them reliably. It is hard. Formulating a right question is always hard, but with big data, it is an order of magnitude harder, because you are blazing the trail (not grazing on the green field).”
As is typically the case with major disruptive technologies, many people are waking up to the fact that realizing the value from big data and becoming a data-driven institution is a lot harder and will take longer than they originally anticipated. But, is it worth it? What is this data-driven world all about?
MIT Media Lab Professor Alex “Sandy” Pentland talks about the promise of becoming a data-driven society in a very interesting online conversation, - Reinventing Society in the Wake of Big Data. Pentland is a big data pioneer, whom O’reilly Media founder Tim O’reilly named one of The World’s 7 Most Powerful Data Scientists in Forbes.
“This is the first time in human history that we have the ability to see enough about ourselves that we can hope to actually build social systems that work qualitatively better than the systems we've always had,” says Pentland. “That’s a remarkable change. It’s like the phase transition that happened when writing was developed or when education became ubiquitous, or perhaps when people began being tied together via the Internet.”
In the video and accompanying transcript, he explains the power of big data. Most people think that big data is about what people do online, like posts on Facebook or searches on Google. Pentland disagrees, because what people post in social media sites or what they search for on the Web is relatively transient information that changes from day to day if not week to week. Nor is the promise of big data about internal company processes and RFIDs.
“I believe that the power of Big Data is that it’s information about people’s behavior - it’s about customers, employees, and prospects for your new business, . . .” he says. “This Big Data comes from location data from your cell phone and transaction data about the things you buy with your credit card. It’s the little data breadcrumbs that you leave behind you as you move around in the world.”
“What those breadcrumbs tell is the story of your life. It tells what you’ve chosen to do. . . . Who you actually are is determined by where you spend time, and which things you buy. Big data is increasingly about real behavior, and by analyzing this sort of data, scientists can tell an enormous amount about you. They can tell whether you are the sort of person who will pay back loans. They can tell you if you’re likely to get diabetes.”
Wikipedia defines big data as “a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications.” Gartner developed a 3V framework that looks at big data as “high volume, high velocity, and/or high variety information assets that require new forms of processing to enable enhanced decision making, insight discovery and process optimization.”
These are both good, succinct descriptions from the point of view of the technologies used to store, manage, access and analyze the data. But Pentland tells us that the real value of big data is what it enables us to learn about people, - not as undifferentiated members of a group, but as unique individuals. And, because our behavior is so determined by our social context, that is, our connections with the all the people around us, as well as our connections with the various communities and organizations we are a part of, big data is particularly valuable in helping to sort out our social fabric.
To help better understand the connected world we are enmeshed in, Pentland and professor Asu Ozdaglar recently created the MIT Center for Connections Science and Engineering. Connections Science is an example of the new data science disciplines being enabled by the advent of big data.
We have generally been studying complex systems in terms of averages, probability distributions and expected values. This has worked well for complex physical and engineered systems, where in general, similar components exhibit similar properties and behaviors, - e.g, electrons, water molecules, car tires, airplane wings. But, it does not work so well for complex sociotechnical systems, - e.g., cities, health systems, financial markets, companies, governments, - where the key components are people.
We share behaviors and beliefs with other members of the groups we are part of. These groups can be based on our personal attributes, - e.g., gender, age, ethnicity, religion, sexual preference; our home and family status; our educational history; our work and career; our income and assets; our entertainment and sports preferences; and so on. Big data can help infer our key traits based on the traits we share with each of these groups.
But, unlike physical and engineered objects, each person is unique. What makes us unique is that each of us has many dimensions. Our traits are a composite of the traits of many different groups. Many different kinds of data sources are thus required to make accurate predictions about our behaviors and beliefs. That’s why it’s so unfair when people are profiled as financial or security risks based on only a few of their attributes, when a more extensive knowledge of who they are might lead to very different decisions.
“While it may be useful to reason about the averages, social phenomena are really made up of millions of small transactions between individuals. There are patterns in those individual transactions that are not just averages, they’re the things that are responsible for the flash crash and the Arab spring. You need to get down into these new patterns, these micro-patterns, because they don’t just average out to the classical way of understanding society. We’re entering a new era of social physics, where it’s the details of all the particles - the you and me - that actually determine the outcome.”
For big data to realize its potential requires access to vast amounts of personal information, which leads to very serious issues about privacy, data ownership and data control. Pentland strongly advocates that individuals should have the final say about the use of the data collected about them, including the ability to put the data in circulation and turn it into a personal asset by giving permission to share it for value in return. He has been working closely with the World Economic Forum (WEF) to help develop the proper guidelines for the collection and use of personal data in collaboration with private companies, government representatives, end user privacy and rights groups, academics and others.
In 2011, Pentland founded the Institute for Data Driven Design - ID3, a research and educational nonprofit to help define the kind of principles, contracts and rules needed to empower individuals to assert greater control over their data and digital identities and authentication. ID3 is developing software mechanisms and an open software platform to implement and enforce these principles.
“The fact that we can now begin to actually look at the dynamics of social interactions and how they play out, and are not just limited to reasoning about averages like market indices is for me simply astonishing. To be able to see the details of variations in the market and the beginnings of political revolutions, to predict them, and even control them, is definitely a case of Promethean fire. . . We’re going to reinvent what it means to have a human society.”