Big Data is one of the hottest topics out there. Big data is a foundational element in IT’s quartet of Next Big Things: Social, Mobile, Analytics and Cloud. But, as the real world keeps reminding us, it is possible to make bad predictions and decisions even if you use tons of big data to make them. The 9/11 attacks showed how even highly sophisticated intelligence agencies can fail to pick out highly relevant signals amidst the mountains of data being analyzed. Our recent financial crisis showed how even the best and brightest can fail to detect an approaching catastrophic storm. The failure of so many professional forecasters to accurately predict the 2012 presidential election shows that you can find almost any answer you want in all that big data.
Big data is indeed incredibly useful in all kinds of endeavors, but only in the hands of talented professionals who know what they are doing and are aware of its pitfalls and limitations. What are some of these limitations? In thinking about this question over the last few years, I started to notice that a number of subtle, non-intuitive concepts that I learned many years ago as a physics student seem to apply to the world of big data and information-based predictions in highly complex systems. Let me explain.
Over 300 years ago, Isaac Newton laid down the foundations of classical mechanics with the publication of his Laws of Motion. The elegant mathematical models of Newtonian physics depict a world in which objects exhibit deterministic behaviors, that is, the same objects, subject to the same forces, will always yield the same results. These models make perfect predictions within the accuracy of their human-scale measurements. Classical mechanics works exceptionally well for describing the behavior of objects that are more or less observable to the naked eye. It accurately predicts the motion of planets as well as the flight of a baseball.
But, the idea of scientific determinism, which would in principle enable us to predict the future behavior of any object in the universe, began to fall apart in the early 20th century. Classical mechanics could not explain the counter-intuitive and seemingly absurd behavior of energy and matter at atomic as well as cosmological scales. Once you start dealing with atoms, molecules, exotic subatomic particles, black holes and the Big Bang, you find yourself in a whole different world, with somewhat bizarre behaviors like the tunneling effect which are governed by the laws of quantum mechanics and relativity. The orderly, deterministic world of classical physics gives way to a world of wave functions, probability distributions, uncertainty principles, and wave-particle dualities.
In addition, there is no such thing as absolute reality. In classical mechanics something either has the properties of a particle, e.g., a planet, a baseball; or of a wave, e.g, light, sound. In quantum mechanics all objects exhibit both kinds of properties. The concept of wave-particle duality explains that reality depends on what question you are asking and what experiment you perform to answer the question. The very act of observing an object will change the object being observed. Any instruments used to measure its properties will invariable alter the properties being measured.
This transition, from a world view based on scientific determinism to one based on probability distributions, uncertainty principles and subjective reality is not intuitive and difficult to get used to. Even Albert Einstein had trouble accepting it, and famously said “God does not play dice with the universe.” Stephen Hawking, one of world’s top theoretical physicist, concluded in this brilliant lecture:
“ . . .it seems Einstein was doubly wrong when he said, God does not play dice. Not only does God definitely play dice, but He sometimes confuses us by throwing them where they can't be seen. . . The universe does not behave according to our pre-conceived ideas. It continues to surprise us.”
But, the worlds of the very small, as well as the very large, are not the only ones that exhibit counter-intuitive, seemingly magical behaviors. So is the world of highly complex systems, especially those systems whose components and interrelationships are themselves quite complex, as is the case with systems biology and evolution.
Such is also the case with organizational and sociotechnical systems whose main components are people. Even though these chaotic systems are in principle deterministic, their dynamic, non-linear nature renders them increasingly unpredictable and accounts for their emergent behavior. New terms, like long tails, Freakonomics and black swan theory, - every bit as fanciful as quarks, charm and strangeness, - have begun to enter our lexicon.
Artificial Intelligence (AI) is an example of a discipline that has transitioned from its original classical, deterministic approach to an approach more suitable to a highly complex, inherently unpredictable topic like intelligence.
AI was one of the hottest areas in computer sciences, in the 1960s and 1970s. Many of the AI leaders in those days were convinced that you could build a machine as intelligent as a human being based on logical deductions and the kind of step-by-step reasoning that humans use when solving puzzles or proving theorems. They obtained considerable government funding in the US, UK and Japan to implement their vision. But eventually it became clear that all these various projects had grossly underestimated the difficulties of developing any kind of AI system based on logic programming and deductive reasoning. The field went through a so-called AI winter in the 1980s.
But things started to change in the 1990s when AI switched paradigms and embraced data mining and information analytics, the precursors of today’s big data. Instead of trying to program computers to act intelligently, AI embraced a statistical, brute force approach based on analyzing vast amounts of information using powerful computers and sophisticated algorithms.
We discovered that such a statistical, information-based approach produced something akin to intelligence or knowledge. Moreover, unlike the earlier programming-based projects, the statistical approaches scaled very nicely. The more information you had, the more powerful the supercomputers, the more sophisticated the algorithms, the better the results. Deep Blue, IBM's chess playing supercomputer, demonstrated the power of such a statistical approach by beating then reigning chess champion Gary Kasparov in a celebrated match in May of 1997.
Since that time, analyzing or searching large amounts of information has become increasingly important and commonplace in a wide variety of disciplines. Today, most of us use search engines as the primary mechanism for finding information in the World Wide Web. Researchers have been developing sophisticated question-answering systems, which can successfully analyze the nuances and context embedded in a complex, natural language question and come up with the right answer. Watson, IBM’s Question Answering computer, which in February of 2011 won the Jeopardy! Challenge against the two best human Jeopardy! players, is an example of such a system.
Economics is another discipline that has had to make the transition from a world of relatively simple mathematical models to one governed by the sophisticated analysis of real world information. Around the 1960s, a number of economists, most prominently those associated with the Chicago School of Economics, based their work on what NY Times columnist David Brooks referred to as “the era of economic scientism: the period when economists based their work on a crude vision of human nature (the perfectly rational, utility-maximizing autonomous individual) and then built elaborate models based on that creature.” Paul Krugman called such models, an “idealized vision of an economy in which rational individuals interact in perfect markets . . . gussied up with fancy equations” in a 2009 NY Times Magazine article, How did Economists get it so Wrong?
The elegant, mathematic theories of economic scientism managed to convince a number of powerful government leaders that free markets could self-adjust to just about any problems, thus requiring a very limited, circumscribed role for government. Alan Greenspan, the Chairman of the Federal Reserve from 1987-2006, for example, was one of the believers in this well-behaved, self-adjusting economic order. Even when the financial system began to show signs of the coming crisis, Greenspan continued to hold on to his beliefs that derivatives and other financial instruments were extraordinarily useful in distributing risks, thus lessening the need for regulating the increasingly complex financial markets. It wasn’t until October of 2008 that, in testimony before Congress, Greenspan finally acknowledged that perhaps he may have been partially wrong and was now in “a state of shocked disbelief.”
A whole slew of new ideas is now sweeping the field of economics. The new breed of economists are creating a field that has much more in common with empirical sciences than with pure math. Following in the best tradition of physics, chemistry, biology and the social sciences, they are grounding economics on observation and experiments. Theories arise out of empirical analysis, and must reflect the realities, and therefore the inconsistencies and messiness of the real world they aim to explain. They are trying to take into account the social, cognitive and emotional factors that go into the economic decisions that people make.
In discipline after discipline, we are beginning to learn how to deal with the very messy world of big data and complex systems, and how to best apply our learning to make good decisions and good predictions. One of the hardest parts of that learning is the need to let go of our preconceived notions of scientific determinism and get used to living in a world of probabilities, uncertainties and subjective realities. God does indeed like to play games with the universe, but He leaves enough hints around so we too can play the game and keep moving forward.