Big Data is one of the hottest topics out there. Big data is a foundational element in IT’s quartet of Next Big Things: Social, Mobile, Analytics and Cloud. But, as the real world keeps reminding us, it is possible to make bad predictions and decisions even if you use tons of big data to make them. The 9/11 attacks showed how even highly sophisticated intelligence agencies can fail to pick out highly relevant signals amidst the mountains of data being analyzed. Our recent financial crisis showed how even the best and brightest can fail to detect an approaching catastrophic storm. The failure of so many professional forecasters to accurately predict the 2012 presidential election shows that you can find almost any answer you want in all that big data.
Big data is indeed incredibly useful in all kinds of endeavors, but only in the hands of talented professionals who know what they are doing and are aware of its pitfalls and limitations. What are some of these limitations? In thinking about this question over the last few years, I started to notice that a number of subtle, non-intuitive concepts that I learned many years ago as a physics student seem to apply to the world of big data and information-based predictions in highly complex systems. Let me explain.
Over 300 years ago, Isaac Newton laid down the foundations of classical mechanics with the publication of his Laws of Motion. The elegant mathematical models of Newtonian physics depict a world in which objects exhibit deterministic behaviors, that is, the same objects, subject to the same forces, will always yield the same results. These models make perfect predictions within the accuracy of their human-scale measurements. Classical mechanics works exceptionally well for describing the behavior of objects that are more or less observable to the naked eye. It accurately predicts the motion of planets as well as the flight of a baseball.
But, the idea of scientific determinism, which would in principle enable us to predict the future behavior of any object in the universe, began to fall apart in the early 20th century. Classical mechanics could not explain the counter-intuitive and seemingly absurd behavior of energy and matter at atomic as well as cosmological scales. Once you start dealing with atoms, molecules, exotic subatomic particles, black holes and the Big Bang, you find yourself in a whole different world, with somewhat bizarre behaviors like the tunneling effect which are governed by the laws of quantum mechanics and relativity. The orderly, deterministic world of classical physics gives way to a world of wave functions, probability distributions, uncertainty principles, and wave-particle dualities.
In addition, there is no such thing as absolute reality. In classical mechanics something either has the properties of a particle, e.g., a planet, a baseball; or of a wave, e.g, light, sound. In quantum mechanics all objects exhibit both kinds of properties. The concept of wave-particle duality explains that reality depends on what question you are asking and what experiment you perform to answer the question. The very act of observing an object will change the object being observed. Any instruments used to measure its properties will invariable alter the properties being measured.
This transition, from a world view based on scientific determinism to one based on probability distributions, uncertainty principles and subjective reality is not intuitive and difficult to get used to. Even Albert Einstein had trouble accepting it, and famously said “God does not play dice with the universe.” Stephen Hawking, one of world’s top theoretical physicist, concluded in this brilliant lecture:
“ . . .it seems Einstein was doubly wrong when he said, God does not play dice. Not only does God definitely play dice, but He sometimes confuses us by throwing them where they can't be seen. . . The universe does not behave according to our pre-conceived ideas. It continues to surprise us.”
But, the worlds of the very small, as well as the very large, are not the only ones that exhibit counter-intuitive, seemingly magical behaviors. So is the world of highly complex systems, especially those systems whose components and interrelationships are themselves quite complex, as is the case with systems biology and evolution.
Such is also the case with organizational and sociotechnical systems whose main components are people. Even though these chaotic systems are in principle deterministic, their dynamic, non-linear nature renders them increasingly unpredictable and accounts for their emergent behavior. New terms, like long tails, Freakonomics and black swan theory, - every bit as fanciful as quarks, charm and strangeness, - have begun to enter our lexicon.
Artificial Intelligence (AI) is an example of a discipline that has transitioned from its original classical, deterministic approach to an approach more suitable to a highly complex, inherently unpredictable topic like intelligence.
AI was one of the hottest areas in computer sciences, in the 1960s and 1970s. Many of the AI leaders in those days were convinced that you could build a machine as intelligent as a human being based on logical deductions and the kind of step-by-step reasoning that humans use when solving puzzles or proving theorems. They obtained considerable government funding in the US, UK and Japan to implement their vision. But eventually it became clear that all these various projects had grossly underestimated the difficulties of developing any kind of AI system based on logic programming and deductive reasoning. The field went through a so-called AI winter in the 1980s.
But things started to change in the 1990s when AI switched paradigms and embraced data mining and information analytics, the precursors of today’s big data. Instead of trying to program computers to act intelligently, AI embraced a statistical, brute force approach based on analyzing vast amounts of information using powerful computers and sophisticated algorithms.
We discovered that such a statistical, information-based approach produced something akin to intelligence or knowledge. Moreover, unlike the earlier programming-based projects, the statistical approaches scaled very nicely. The more information you had, the more powerful the supercomputers, the more sophisticated the algorithms, the better the results. Deep Blue, IBM's chess playing supercomputer, demonstrated the power of such a statistical approach by beating then reigning chess champion Gary Kasparov in a celebrated match in May of 1997.
Since that time, analyzing or searching large amounts of information has become increasingly important and commonplace in a wide variety of disciplines. Today, most of us use search engines as the primary mechanism for finding information in the World Wide Web. Researchers have been developing sophisticated question-answering systems, which can successfully analyze the nuances and context embedded in a complex, natural language question and come up with the right answer. Watson, IBM’s Question Answering computer, which in February of 2011 won the Jeopardy! Challenge against the two best human Jeopardy! players, is an example of such a system.
Economics is another discipline that has had to make the transition from a world of relatively simple mathematical models to one governed by the sophisticated analysis of real world information. Around the 1960s, a number of economists, most prominently those associated with the Chicago School of Economics, based their work on what NY Times columnist David Brooks referred to as “the era of economic scientism: the period when economists based their work on a crude vision of human nature (the perfectly rational, utility-maximizing autonomous individual) and then built elaborate models based on that creature.” Paul Krugman called such models, an “idealized vision of an economy in which rational individuals interact in perfect markets . . . gussied up with fancy equations” in a 2009 NY Times Magazine article, How did Economists get it so Wrong?
The elegant, mathematic theories of economic scientism managed to convince a number of powerful government leaders that free markets could self-adjust to just about any problems, thus requiring a very limited, circumscribed role for government. Alan Greenspan, the Chairman of the Federal Reserve from 1987-2006, for example, was one of the believers in this well-behaved, self-adjusting economic order. Even when the financial system began to show signs of the coming crisis, Greenspan continued to hold on to his beliefs that derivatives and other financial instruments were extraordinarily useful in distributing risks, thus lessening the need for regulating the increasingly complex financial markets. It wasn’t until October of 2008 that, in testimony before Congress, Greenspan finally acknowledged that perhaps he may have been partially wrong and was now in “a state of shocked disbelief.”
A whole slew of new ideas is now sweeping the field of economics. The new breed of economists are creating a field that has much more in common with empirical sciences than with pure math. Following in the best tradition of physics, chemistry, biology and the social sciences, they are grounding economics on observation and experiments. Theories arise out of empirical analysis, and must reflect the realities, and therefore the inconsistencies and messiness of the real world they aim to explain. They are trying to take into account the social, cognitive and emotional factors that go into the economic decisions that people make.
In discipline after discipline, we are beginning to learn how to deal with the very messy world of big data and complex systems, and how to best apply our learning to make good decisions and good predictions. One of the hardest parts of that learning is the need to let go of our preconceived notions of scientific determinism and get used to living in a world of probabilities, uncertainties and subjective realities. God does indeed like to play games with the universe, but He leaves enough hints around so we too can play the game and keep moving forward.
Great article.. Since couple of years I'm working on this area...We are using software ag stack to implement this...the concept is good but it is very hard to find talented people in this niche area...
Posted by: Sasha | December 05, 2012 at 05:24 PM
Hi Irving - In regard to foreseeing the outcome of the election, I pay the most attention to Nate Silver's 538 blog in the New York Times. He very accurately predicted the outcome of the presidential race and several Senatorial and Congressional races as well. As to foreseeing the recent market crash, about one year before it occurred I was in a meeting with my financial adviser discussing how best to manage my money - I told her I wanted her to manage it with modest income if possible, but primarily for preservation of capital as I felt that a crash was imminent. She asked me why I thought so and I told her it was because the income disparity between the rich and everybody else was at its highest level since just before the Great Depression, and that those very rich people who were getting paid so much money thought that they were actually worth it, and for the most part they are NOT! She agreed and did manage my money as I asked her to and I came through the crash pretty much unscathed.
I mention these things not because I am trying to brag, but because sometimes the data is so voluminous that it gets in the way of analysis and you need to step back and look at the big picture, and sometimes at aspects of it that other people seem to be ignoring, and if you do so well enough you can see things coming that others do not. Nate Silver is a first rate statistician, and I am just an average guy who can occasionally looks at things a little differently than others do and sometimes sees some things that they do not. Neither of us is that unique - I think that we just try to keep our wishful thinking out of what we are looking at and see only what the data is actually telling us. As I recall, you were pretty good at this when we were at the U of C, and I suspect that you still are!
Hank Bennett
Posted by: Hank Bennett | December 06, 2012 at 03:41 AM
One additional comment - In my opinion, Milton Friedman and his cohorts at the Chicago School of Economics have proven to be enormously damaging whenever there precepts are actually followed by governments - just look at Greece and Spain right now! Like Mr. Krugman, I believe that Keynesian economics much more accurately describe the real world. Severe recessions and depressions require stimulus and a lot of it. One need only look back at the Great Depression of the 1930s to see this in action. FDR's stimulus efforts were actually working to pull the US out of the Great Depression, albeit somewhat slowly, until the great mistake of 1937 when Roosevelt fell prey to those who were concerned with the deficits the USA was running and cut back on his New Deal programs. This caused a recession that lasted until the MASSIVE deficit spending required to fight and win WWII pulled us out of the Great Depression. This simple fact alone should be enough to convince anyone that Keynes was right, but there are those who quite simply don't want to see it because their preconceived ideas don't fit that model, so they don't see it!
Posted by: Hank Bennett | December 06, 2012 at 03:52 AM
Dear Irving,
What a nice blog!I cannot avoid being impressed by such a breath of reference.
Working in an international development agency we are 'frantically' searching to get to terms with complexity and data. The counter-intuitive nature of many of the complexity arguments does not only make it difficult to understand, it makes it also difficult to design an action perspective, and to advocate, to create common understanding. Although the probability approach sounds nice, it is not so easy to actually find which direction is more likely than another.
Data, big and small, is essential, but I am not so sure about this argument that they are only safe in the hands of the 'talented professionals'. How can I know who is talented? Or a professional for that matter? I believe more in getting data 'out there' and let those who are interested play with it... But that Data is just data is for sure. If we wish to remain relevant, it is up to us to make sense (http://europeandcis.undp.org/blog/2012/11/28/development-data-still-needs-its-captain-kirk/).
We soon will start our in-field experiments with the narrative approach (Snowden c.s.) and Big Data. You can follow us on our blog page where we report back our experiences (http://europeandcis.undp.org/blog/).
Posted by: Albert Soer | December 06, 2012 at 11:25 AM
Irving,
I think you should more closely examine Greenspan's testimony; he was shocked at the fact that organizations would not act in their long-term interest and would take short-term gains that would kill the company.
I don't think he understood the level of greed and self-interest that were demonstrated, but then again, we no longer hang people in the public square nor hold those accountable, but rather bail-out the companies and spread the blame so that no one is to blame.
Talk about moral hazard.
Posted by: Dcmartin | December 12, 2012 at 10:38 PM
I read you post and cannot help but think of Jay Forrester. If only economics and system dynamics would have met and find a common ground...
Posted by: Mona Vernon | December 14, 2012 at 11:55 AM