The Economist’s May 6 issue referred to data in its cover as the world’s most valuable resource. Its lead article, - Data is giving rise to a new economy, - called it the fuel of the future. “Data are to this century what oil was to the last one: a driver of growth and change.” Both oil refineries and data centers fulfill similar roles: “producing crucial feedstocks for the world economy.” Much of modern life would not exist without the cars, plastics and drugs made possible by oil refineries. Similarly, data centers “power all kinds of online services and, increasingly, the real world as devices become more and more connected.”
Data has been closely intertwined with our digital technology revolution since the early days of the IT industry. Data processing was the term then used to describe the applications of IT to automate highly structured business processes, e.g., financial transactions, inventory management, airline reservations. Over time, increasingly sophisticated applications were developed to better manage key business operations along with their associated data, including enterprise resource planning, customer relationship management and human resources. Beyond their use in operations, the information generated by these various applications was collected in data warehouses, and a variety of business intelligence tools were used to analyze the data and generate management reports.
These commercial applications dealt mostly with structured information in those early years of computing. But at the same time, the scientific community was developing tools for managing, analyzing and visualizing the high volumes of much more unstructured data generated by their experiments and observations. Physicists, astronomers, biologists, and other scientists and engineers were developing methodologies and architectures for dealing with very large volumes of unstructured data, as well as analytical techniques, like data mining, for discovering patterns and extracting insights from all that data.
Then came the explosive growth of the Internet in the mid-1990s. Ever since, digital technologies have been permeating just about every nook and cranny of the economy, society and our personal lives. Data is now being generated by just about everything and everybody around us, including the growing volume of online and offline transactions, web searches, social media interactions, billions of smart mobile devices and 10s of billions of smart IoT sensors.
Throughout history, scientific revolutions have been launched when new tools make possible new measurements and observations, e.g., the telescope, the microscope, spectrometers, DNA sequencers. They’ve enabled us to significantly increase our understanding of the natural world around us by collecting and analyzing large amounts of data. Our new big data tools now have the potential to usher an information-based scientific revolution on just about any domain of knowledge, - including people, and their varied interactions, organizations and institutions, - given our ability to now gather valuable data on almost any area of interest.
There’s so much data all around us that, as The Economist noted, we’re now seeing the rise of a new kind of data economy. IDC predicts that the total amount of digital data created worldwide will reach 44 zettabytes (1021) by 2020 and 180 zettabytes by 2025. All these data is enabling us to better understand the world’s physical, economic and social infrastructures, as well as to infuse information-based intelligence into every aspect of their operations. It’s making it possible to not just better understand what’s happening in the present, but to also make more accurate predictions about what might happen in the future. Beyond its use in improving the operational efficiency and financial performance of companies, the data can now be applied to significantly improve customer relationships, as well as to create whole new classes of smart products and services.
“The quality of data has changed, too,” said The Economist. “They are no longer mainly stocks of digital information—databases of names and other well-defined personal data, such as age, sex and income. The new economy is more about analysing rapid real-time flows of often unstructured data: the streams of photos and videos generated by users of social networks, the reams of information produced by commuters on their way to work, the flood of data from hundreds of sensors in a jet engine. From subway trains and wind turbines to toilet seats and toasters - all sorts of devices are becoming sources of data.” IDC estimates that there will be 30 billion such data-generating devices by 2020, rising to 80 billion by 2025.
The competitive value of data is also rapidly increasing. Beyond its use in personalized marketing, customer service, improved fraud detection, and targeted advertising, companies are beginning to leverage data and advanced machine learning algorithms to create new revenue-generating cognitive services, such as Google Translate, Microsoft Vision, and IBM Watson Health.
Hardware and software platforms have long raised competitive concerns in the IT industry, due to the economic power of network effects. Scale significantly increases a platform’s value. The more third party applications and services a platform offers, the more users it will attract, helping it then attract more offerings, which in turn brings in more users, which then makes the platform even more valuable… and on and on and on.
With the rise of the data economy, we’re now seeing the emergence of even more powerful data network effects. The more users and offerings a platforms attracts, the more data is available to customize offerings to user preferences, helping better match supply and demand and further increasing the platform’s value, which will then attract more users and offerings and generate even more data.
The biggest beneficiaries of such data network effects are the Internet giants, especially Alphabet/Google, Amazon, Apple, Facebook and Microsoft. “They are the five most valuable listed firms in the world,” said The Economist in an article exploring the regulatory implications of the new data economy. “Their profits are surging: they collectively racked up over $25bn in net profit in the first quarter of 2017. Amazon captures half of all dollars spent online in America. Google and Facebook accounted for almost all the revenue growth in digital advertising in America last year. Such dominance has prompted calls for the tech giants to be broken up, as Standard Oil was in the early 20th century,” when the resource then raising concerns was oil instead of data.
But while it’s very difficult to compete with the Internet giants in B-to-C applications involving consumer data, the situation might be quite different when it comes to the data needed for B-to-B transactions among companies, and/or for mission critical applications where security and privacy are paramount. The management of digital identities is one such important application, since such identities are the key to determine the particular transactions in which individuals and institutions can rightfully participate, as well as the data they are entitled to access. In those cases, rather than all the data coming from a single company, the needed data will generally be siloed within different private and public sector institutions, each using its data for its own purposes. To reach a higher level of privacy and security we need to establish a trusted data ecosystem, which requires the interoperability and sharing of data across the various institutions involved.
It’s not only highly unsafe, but also totally infeasible to gather all the needed data in a central data warehouse. Few institutions will let their critical data out of their premises. MIT’s OPAL and Enigma projects are developing a framework for safely sharing data across institutions. Instead of copying or moving the data across, the agreed upon queries are sent to the institution owning the data, executed behind the firewalls of the data owners, and only the encrypted results are shared.
The data economy is giving raise to many tough questions, from how to best foster the establishment of thriving data markets and ecosystems, to the regulatory and antitrust actions required to limit the dominance of Internet giants. “Teething problems,” The Economist calls them, reminding us that it took decades for well functioning oil markets to emerge. “The nature of data makes the antitrust remedies of the past less useful.” We have much to learn as the data economy moves forward.