For several years now, big data has been one of the hottest topics in the IT industry. But, what do we mean by big data? What are people really excited about? Is it real, hype, or something in between?
A number of recent articles have been sounding the alarm that big data may be at the peak of inflated expectations in Gartner’s hype cycle. For example, Gartner’s Research director Svetlana Sicular recently observed that big data has already reached the peak of the hype cycle, and is now falling into the trough of disillusionment, a necessary step before (hopefully) moving on to the slope of enlightenment. She writes that a number of her most advanced big data clients are starting to get disillusioned:
“These organizations have fascinating ideas, but they are disappointed with the difficulty of figuring out reliable solutions. . . Several days ago, a financial industry client told me that framing a right question to express a game-changing idea is extremely challenging: first, selecting a question from multiple candidates; second, breaking it down to many sub-questions; and, third, answering even one of them reliably. It is hard. Formulating a right question is always hard, but with big data, it is an order of magnitude harder, because you are blazing the trail (not grazing on the green field).”
As is typically the case with major disruptive technologies, many people are waking up to the fact that realizing the value from big data and becoming a data-driven institution is a lot harder and will take longer than they originally anticipated. But, is it worth it? What is this data-driven world all about?
MIT Media Lab Professor Alex “Sandy” Pentland talks about the promise of becoming a data-driven society in a very interesting online conversation, - Reinventing Society in the Wake of Big Data. Pentland is a big data pioneer, whom O’reilly Media founder Tim O’reilly named one of The World’s 7 Most Powerful Data Scientists in Forbes.
“This is the first time in human history that we have the ability to see enough about ourselves that we can hope to actually build social systems that work qualitatively better than the systems we've always had,” says Pentland. “That’s a remarkable change. It’s like the phase transition that happened when writing was developed or when education became ubiquitous, or perhaps when people began being tied together via the Internet.”
In the video and accompanying transcript, he explains the power of big data. Most people think that big data is about what people do online, like posts on Facebook or searches on Google. Pentland disagrees, because what people post in social media sites or what they search for on the Web is relatively transient information that changes from day to day if not week to week. Nor is the promise of big data about internal company processes and RFIDs.
“I believe that the power of Big Data is that it’s information about people’s behavior - it’s about customers, employees, and prospects for your new business, . . .” he says. “This Big Data comes from location data from your cell phone and transaction data about the things you buy with your credit card. It’s the little data breadcrumbs that you leave behind you as you move around in the world.”
“What those breadcrumbs tell is the story of your life. It tells what you’ve chosen to do. . . . Who you actually are is determined by where you spend time, and which things you buy. Big data is increasingly about real behavior, and by analyzing this sort of data, scientists can tell an enormous amount about you. They can tell whether you are the sort of person who will pay back loans. They can tell you if you’re likely to get diabetes.”
Wikipedia defines big data as “a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications.” Gartner developed a 3V framework that looks at big data as “high volume, high velocity, and/or high variety information assets that require new forms of processing to enable enhanced decision making, insight discovery and process optimization.”
These are both good, succinct descriptions from the point of view of the technologies used to store, manage, access and analyze the data. But Pentland tells us that the real value of big data is what it enables us to learn about people, - not as undifferentiated members of a group, but as unique individuals. And, because our behavior is so determined by our social context, that is, our connections with the all the people around us, as well as our connections with the various communities and organizations we are a part of, big data is particularly valuable in helping to sort out our social fabric.
To help better understand the connected world we are enmeshed in, Pentland and professor Asu Ozdaglar recently created the MIT Center for Connections Science and Engineering. Connections Science is an example of the new data science disciplines being enabled by the advent of big data.
We have generally been studying complex systems in terms of averages, probability distributions and expected values. This has worked well for complex physical and engineered systems, where in general, similar components exhibit similar properties and behaviors, - e.g, electrons, water molecules, car tires, airplane wings. But, it does not work so well for complex sociotechnical systems, - e.g., cities, health systems, financial markets, companies, governments, - where the key components are people.
We share behaviors and beliefs with other members of the groups we are part of. These groups can be based on our personal attributes, - e.g., gender, age, ethnicity, religion, sexual preference; our home and family status; our educational history; our work and career; our income and assets; our entertainment and sports preferences; and so on. Big data can help infer our key traits based on the traits we share with each of these groups.
But, unlike physical and engineered objects, each person is unique. What makes us unique is that each of us has many dimensions. Our traits are a composite of the traits of many different groups. Many different kinds of data sources are thus required to make accurate predictions about our behaviors and beliefs. That’s why it’s so unfair when people are profiled as financial or security risks based on only a few of their attributes, when a more extensive knowledge of who they are might lead to very different decisions.
“While it may be useful to reason about the averages, social phenomena are really made up of millions of small transactions between individuals. There are patterns in those individual transactions that are not just averages, they’re the things that are responsible for the flash crash and the Arab spring. You need to get down into these new patterns, these micro-patterns, because they don’t just average out to the classical way of understanding society. We’re entering a new era of social physics, where it’s the details of all the particles - the you and me - that actually determine the outcome.”
For big data to realize its potential requires access to vast amounts of personal information, which leads to very serious issues about privacy, data ownership and data control. Pentland strongly advocates that individuals should have the final say about the use of the data collected about them, including the ability to put the data in circulation and turn it into a personal asset by giving permission to share it for value in return. He has been working closely with the World Economic Forum (WEF) to help develop the proper guidelines for the collection and use of personal data in collaboration with private companies, government representatives, end user privacy and rights groups, academics and others.
In 2011, Pentland founded the Institute for Data Driven Design - ID3, a research and educational nonprofit to help define the kind of principles, contracts and rules needed to empower individuals to assert greater control over their data and digital identities and authentication. ID3 is developing software mechanisms and an open software platform to implement and enforce these principles.
“The fact that we can now begin to actually look at the dynamics of social interactions and how they play out, and are not just limited to reasoning about averages like market indices is for me simply astonishing. To be able to see the details of variations in the market and the beginnings of political revolutions, to predict them, and even control them, is definitely a case of Promethean fire. . . We’re going to reinvent what it means to have a human society.”
Irving, While I agree with this there is an inherent flaw in the observation and it is one that many teachers make. The flaw is which entity is in control of the learning process. Teachers believe they are; however, they're wrong - the student is in charge. If the student has willing suspension of disbelief and believes in the teacher and persists through the learning process they will accept the new paradigms. However, Dr. Richard Massey many years ago noticed that people will not change their deeply held beliefs absent what he called "a significant emotional event'. An example is that you don't walk in to someone's office of "Religion A", wave your hands, speak a few incantations, and expect to convert them to "Religion B". Paradigms do not change that easily. I've been working on a software system for a number of years to address exactly this issue and ironically in the context of this article's premise. I have not just the architecture for this system but the method and techniques required to implement and use it. The user interface is a radical departure from any predecessor. In fact, I believe it to be revolutionary. If I were at IBM I would be driving this through the company. Alas, I'm not and am working issues through my own meager efforts.
Posted by: Charles C McGowen | March 20, 2013 at 11:06 AM
Discretely defining the attributes of individuals by the connections they make to allow the specification of the individual, the stereotyping of individuals into markets, and/or for other purposes is an outcome that will ill-serve the population, the individual.
No doubt, understanding the wants and needs of the individual by the connections individually excreted will give business interests a belief that they understand their respective customers, their markets. In reality, I doubt that is so.
Will people be so predictable that they continue to follow the pathway of their connections history, or will people continue to connect, disconnect and reconnect in patterns that can only be ascertained by a dynamic monitoring that goes beyond a reasonable surveillance technique employable by our government and/or private businesses?
With the legal concerns for personal privacy, the reach of big brother and the restrictions of ethical decency, will the collection of data continue to be so freely accomplished? Hopefully, people will begin to understand the ramifications of the personal assets that they relinquish to social networks, cell phone GPS tracking, etc. With this understanding, I suspect that the answer to allowing the continued unrestricted access to personal data question becomes an emphatic “no” with a subsequent demand for government to enact laws preventing such access.
Technology allows for many things, of that there is no question. But, are the many capabilities that technology brings, acceptable to an informed public? Time will tell, but I am hopeful that the American people will begin to understand that hanging all of one's personal data out for anyone to see, to use, to abuse, is a real and continuing threat to the personal freedom that we have enjoyed in years gone by.
Can the underlying building blocks of “big data” be collected? Can the data be massaged and analyzed? Can inferences be made from the collected and analyzed data? Undoubtedly, the answer to all these questions is “yes”.
Will, what has been, a personal data apathetic public awaken and disallow the continuation of this invasion of its collective and individual privacy at a pace that technology allows? Hopefully, the answer to this question will also be “yes”.
Posted by: Bud Byrd | March 21, 2013 at 12:36 PM
Great write-up on a very important topic. However, I am not sure if I understand why we need a new discipline in Universities to tackle this when we had for decades the Analytical focus called Operations Research (or Operational Research as British called it) as a field of study and topic in several great schools.
Posted by: Armen Hovanessian | March 25, 2013 at 09:42 PM