When I started to seriously think about Cloud Computing a couple of years ago, the metaphor that first came to mind was that the IT industry was now entering its Cambrian age.
The Cambrian geological period marked a profound change in life on Earth. Before it, most organisms were very simple, composed of individual cells, sometimes organized into colonies, such as sponges. But, starting around 550 million years ago, and over the following 70 or 80 million years, evolution accelerated by an order of magnitude and ushered what is termed the Cambrian Explosion, - a highly diverse set of larger, more complex life forms.
Although with far shorter time periods, the IT industry has been going through something similar. Over several decades, we were perfecting our digital components - microprocessors, memory chips, disks, networking and the like, and we used them to build a variety of computers. Then about ten years ago, the digital components started to become powerful, reliable, inexpensive, and ubiquitous. The acceptance of the Internet introduced a whole new set of technologies and standards for interconnecting all these components, not only in IP networks but across the World Wide Web, different kinds of communications, computer Grids, distributed applications, and so on.
Today, digital components are becoming embedded into just everything and connected to the Internet through an IP address, so that personal devices of all sorts, - consumer electronics, medical equipment, cars, buildings, and just about the whole physical infrastructure around us,- is now part of IT’s life forms. We are using variations of the same digital components to build very large servers and supercomputers with hundreds of thousands of nodes. When we look into the future, we see a not far-off world of billions of mobile devices, trillions of smart-sensors, and a variety of million-node servers, all optimized and co-evolving to support different kinds of workloads. Indeed, it feels like IT is now entering its own Cambrian age.
Not surprisingly, this massive explosion in diversity and scale is leading to many innovations across the IT spectrum. Look at, for example, the advances in smartphones and similar personal mobile devices in just the last couple of years. And, at the other end of the spectrum, we are seeing the data center itself, whose roots go back to the earliest days of computers, being completely reinvented as well.
Earlier this summer, two top Google engineers, Luis Andre Barroso and Urs Hoelzle, published an excellent long paper or mini-book - The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines. The paper introduces the concept of Warehouse-Scale Computers (WSC):
“As computation continues to move into the cloud, the computing platform of interest no longer resembles a pizza box or a refrigerator, but a warehouse full of computers. These new large datacenters are quite different from traditional hosting facilities of earlier times and cannot be viewed simply as a collection of co-located servers. Large portions of the hardware and software resources in these facilities must work in concert to efficiently deliver good levels of Internet service performance, something that can only be achieved by a holistic approach to their design and deployment. In other words, we must treat the datacenter itself as one massive warehouse-scale computer (WSC). We describe the architecture of WSCs, the main factors influencing their design, operation, and cost structure, and the characteristics of their software base. We hope it will be useful to architects and programmers of today's WSCs, as well as those of future many-core platforms which may one day implement the equivalent of today's WSCs on a single board.”
WSCs are specialized architectures particularly optimized to support the kind of Cloud or Internet services offered by Google, Amazon or Yahoo, whose characteristics are markedly different from traditional mainframe or client-server workloads in a number of ways, perhaps most prominently:
“Ample parallelism - Typical Internet services exhibit a large amount of parallelism stemming from both data- and request-level parallelism. Usually, the problem is not to find parallelism but to manage and efficiently harness the explicit parallelism that is inherent in the application. . .”
“Platform homogeneity - . . . Large Internet services operations typically deploy a small number of hardware and system software configurations at any given time . . . Homogeneity within a platform generation simplifies cluster-level scheduling and load balancing and reduces the maintenance burden for platforms software (kernels, drivers, etc.). Similarly, homogeneity can allow more efficient supply chains and more efficient repairs processes because automatic and manual repairs benefit from having more experience with fewer types of systems.”
WSC platforms require a detailed understanding of the workloads they are architected to support. This knowledge enables designers to take new approaches in constructing and operating these systems and are thus able to achieve very high performance and quality at remarkably reasonable costs.
The question then arises whether WSCs and Cloud data centers in general are only applicable for born-to-the-Cloud, relatively young companies like Google, Amazon and Yahoo who run a small number of huge workloads each of which exhibits a very high degree of parallelism. Are they suitable for most enterprises out there, the vast majority of which run more heterogeneous workloads, including a considerable number of legacy applications accumulated over the years?
To help us explore this question, let's look at the evolution of data centers over time.
Data centers are large facilities especially designed to house and operate computers and related equipment, e.g. storage and networking, focused particularly on their physical requirements. They are built on raised floors to accommodate the numerous cables necessary to interconnect all the equipment in the room. Computers, especially in their early days, required a great deal of power, including emergency power supplies so the machines can keep operating in the event of an interruption to their external power.
Computers generate considerable amount of heat, so the temperature in the data center has to be carefully controlled to prevent the machines from overheating. In earlier times, the data center also had to provide chilled water for cooling mainframes and supercomputers. Security is also critical, so few people can get near the actual machines. Data centers used to be called glass houses because they typically included large glass walls enabling visitors to see the computers without entering the room.
Data centers experienced a huge boom in the second half of the 1990s with the advent of the Internet and the dot-com bubble. The number of users they supported went up by orders of magnitude, as anyone with a PC and a browser could now get online. The data centers grew considerably to accommodate the servers and storage needed to support all those users and new Web-based applications, with all the additional power and cooling that required. Most such Internet servers were now based on scalable cluster architectures, housed on racks of standard dimensions. In addition, the Internet data centers now had to provide fast Internet connectivity and all kinds of capabilities to ensure their safe and nonstop operations.
Given the added requirements, complexities and costs of data centers designed to support online Internet users, many companies, especially smaller ones, started to use hosting services, in particular, a new kind of Internet data center called a collocation center. Many of the more expensive services of the data center were now shared among multiple customers, especially access to high bandwidth networking and uninterrupted power supplies. But otherwise, the computers in the data center remained isolated from each other, often behind fences or cages for added security, and were generally dedicated to specific users and applications.
As we know, a kind of mass extinction event took place when the bubble burst in 2000-2001. Most of these colocation centers closed down when many of their dot-com clients disappeared. But, after a short period of recovery, the explosive growth of IT continued unimpeded, giving rise to a new cloud model of computing. And, as often happens after mass extinctions, all kinds of new innovations started to emerge, like the highly scalable architectures that Barroso and Hoelzle describe in their WSC paper.
In retrospect, the clusters and Internet data centers of the client-server era have more in common with pre-Cambrian colonies than with the post-Cambrian complex organisms that eventually evolved into birds and mammals. These primitive biological colonies, still in existence today, include huge numbers of relatively simple, independent organisms living closely together for mutual benefits, such as stronger defenses and the pursuit of large preys. Think of sponges, coral reefs and bacteria, and in more advanced forms, ants, termites and bees.
Colonies and clusters serve a very important purpose, but there is a limit to how intelligent or sophisticated they can become. Beyond a certain size, the scalability of a classic client-server cluster is severely impaired by its inefficiencies. Most such clusters run at low utilizations, 20% or less, while consuming power at 100%. That means that most of the power is wasted. While the hardware costs are very low, their relatively unsophisticated middleware and applications software will make them much too labor intensive for the kinds of systems management costs you need to achieve in very large systems, as well as open them up to outages and security breaches. Their lack of a coherent architecture makes them unsuitable beyond a certain level of scale. Client-server architectures have been fine for their age, but inadequate to take IT to the next level.
Mainframes, on the other hand, are based on very sophisticated hardware and software Symmetric Multi-Processing (SMP) architectures that enables them to achieve high utilizations, - 70% or higher, - reasonable systems management costs for large workloads and very high reliability and security. SMP systems are optimized for transaction processing, data bases and other mission critical applications that require a high degree of sharing and integrity, such as banking transactions, airline reservation systems and the management of critical enterprise resources. They are very good at what they were designed to do, which is why our Darwinian marketplace has payed them its ultimate compliment: they continue to be in wide use today in enterprises around the world.
SMP architectures would be considered overkill and and not cost effective for supporting workloads like e-mail, search, information analysis, image processing, content distribution and maps which require very limited sharing and are distributed or scale-out in nature. These are the kinds of applications being targeted by warehouse-scale computers (WSCs). In the Introduction to their paper, Barroso and Hoelzle write:
“The name is meant to call attention to the most distinguishing feature of these machines: the massive scale of their software infrastructure, data repositories, and hardware platform. This perspective is a departure from a view of the computing problem that implicitly assumes a model where one program runs in a single machine. In warehouse-scale computing, the program is an Internet service, which may consist of tens or more individual programs that interact to implement complex end-user services such as email, search, or maps. . .”
They then explain the difference between their approach and that of traditional clusters and datacenters:
“Traditional datacenters . . . typically host a large number of relatively small- or medium-sized applications, each running on a dedicated hardware infrastructure that is de-coupled and protected from other systems in the same facility. Those datacenters host hardware and software for multiple organizational units or even different companies. Different computing systems within such a datacenter often have little in common in terms of hardware, software, or maintenance infrastructure, and tend not to communicate with each other at all.”
“WSCs . . . differ significantly from traditional datacenters: they belong to a single organization, use a relatively homogeneous hardware and system software platform, and share a common systems management layer. Often much of the application, middleware, and system software is built in-house compared to the predominance of third-party software running in conventional datacenters. Most importantly, WSCs run a smaller number of very large applications (or Internet services), and the common resource management infrastructure allows significant deployment flexibility. . .”
Let me now return to the original question, How relevant are WSC architectures and the Cloud computing model in general to more traditional enterprises which deal with a large variety of workloads, - many of them accumulated over the years and not so easy to transform or evolve?
When the Web became widely popular and we launched the IBM e-business strategy in the mid-1990s, many companies were asking us whether e-business was for them or only for the new dot-com companies that were sprouting all around them.
Our answer was very simple. The Internet is for everyone. Whether you are a large, medium or small company anywhere in the world, or whether you have been around for one hundred years or one hundred days – you should leverage the Internet for business value and become an e-business.
In fact, the new web standards made it relatively easy to integrate the emerging web servers and applications with the company’s legacy infrastructure. Over time, legacy systems started to incorporate the innovations introduced by the new web-based systems. But, given the culture of standards that the Internet introduced to the IT world, it was quite possible to get going right away and then evolve over time.
I feel the same way about Cloud computing. It is for everyone. At its essence, Cloud computing is about delivering a wide variety of consumer and business services to large numbers of clients around the world, as well as operating highly scalable, well engineered, efficient data centers to deliver those services with high quality and reasonable costs. That's what businesses generally do. Clouds are thus relevant to most companies and institutions in one way or another, as providers of services, - in-house or through a Cloud service provider, - or as users, - individuals, small businesses or large enterprises, - availing yourself of the services they provide.
Once more, standards like SOA will make it possible to integrate new Cloud optimized workloads and platforms with a company’s existing infrastructure. And, once more, innovations like those described in the Warehouse-Scale Computing paper will find their way into legacy systems and applications and transform them over time.
When all is said and done, Cloud computing is introducing not just a major new model of computing, but, even more important, a new model for conducting business and interacting with clients, employees, partners and all stake-holders of the institution. This new Cambrian stage of IT is already giving rise to many new innovations, not only in classic computing but in many of the new life forms that now incorporate digital components, software, and are connected to the Internet. It is very important for companies to learn how to leverage these new innovations in their business.
In evolution, it is very costly to be left behind. So, it is very important for all companies and institutions, regardless of age or size, to figure out what this new model of computing means for their particular industry and how it best applies to them. And, like with e-business a dozen years ago, they should all get on the learning curve by doing some marketplace pilots, learn from their experiences and ensure their ability to continue evolving into the future.