Saturday, January 30, 2010

Cloud Computing

What Is the Cloud?

The set of disciplines, technologies, and business models used to render IT capabilities as on-demand services.The term cloud has been used historically as a metaphor for the Internet. This usage was originally derived from its common depiction in network diagrams as an outline of a cloud, used to represent the transport of data across carrier backbones (which owned the cloud) to an endpoint location on the other side of the cloud.

The Emergence of Cloud Computing:

Utility
computing can be defined as the provision of computational and storage resources as a metered service, similar to those provided by a traditional public utility company. This, of course, is not a new idea. This form of computing is growing in popularity, however, as companies have begun to extend the model to a cloud computing paradigm providing virtual servers that IT departments and users can access on demand. Early enterprise adopters used utility computing mainly for non-mission-critical needs, but that is quickly changing as trust and reliability issues are resolved.

Some people think cloud computing is the next big thing in the world of IT. Others believe it is just another variation of the utility computing model that has been repackaged in this decade as something new and cool. However, it is not just the buzzword “cloud computing” that is causing confusion among the masses. Currently, with so few cloud computing vendors actually practicing this form of technology and also almost every analyst from every research organization in the country defining the term differently, the meaning of the term has become very nebulous. While these definitions are succinctly accurate, listing common characteristics found in many cloud computing services will provide scope to the definition and aid in comprehension. These common characteristics include:
  • Shared infrastructure: As a part of doing business, cloud providers invest in and build the infrastructure necessary to offer software, platforms, or infrastructure as a service to multiple consumers. The infrastructure—and environment necessary to house it—represents a large capital expense and ongoing operational expense that the provider must recoup before making a profit. As a result, consumers should be aware that service providers have a financial incentive to leverage the infrastructure across as many consumers as possible.
  • On-demand self-service: On-demand self-service is the cloud customer’s (i.e., consumer) ability to purchase and use cloud services as the need arises. In some cases, cloud vendors provide an application programming interface (API) that enables the consumer to programmatically (or automatically through a management application) consume a service.
  • Elastic and scalable: From a consumer point of view, cloud computing’s ability to quickly provision and deprovision IT services creates an elastic, scalable IT resource. Consumers pay for only the IT services they use. Although no IT service is infinitely scalable, the cloud service provider’s ability to meet consumer’s IT needs creates the perception that the service is infinitely scalable and increases its value.
  • Consumption-based pricing model: Providers charge the consumer per amount of service consumed. For example, cloud vendors may charge for the service by the hour or gigabytes stored per month.
  • Dynamic and virtualized: The need to leverage the infrastructure across as many consumers as possible typically drives cloud vendors to create a more agile and efficient infrastructure that can move consumer workloads, lower overhead, and increase service quality. Many vendors choose server virtualization to create this dynamic infrastructure.
In addition, necessary to define supplementary terms that are typically associated with cloud computing. These include:

  • Public cloud: An IT capability as a service that cloud providers offer to any consumer over the public Internet. Examples: Saleforce.com, Google App Engine, Microsoft Azure, and Amazon EC2.
  • Private cloud: An IT capability as a service that cloud providers offer to a select group of consumers. The cloud service provider may be an internal IT organization (i.e., the same organization as the consumer) or a third party. The network used to offer the service may be the public Internet or a private network, but service access is restricted to authorized consumers.Example: Hospitals or universities that band together to purchase infrastructure and build cloud services for their private consumption.
  • Internal cloud: A subset of a private cloud, an internal cloud is an IT capability offered as a service by an IT organization to its business. For example, IT organizations building highly virtualized environments can become infrastructure providers to internal application developers. In a typical IT organization, application developers are required to work through the IT infrastructure operations team to procure and provision the development and production application platform (e.g., hardware, OS, and middleware) necessary to house a new application. In this model, the infrastructure team provides cloud-like IT infrastructure to the application development team (or any other IT team) thereby allowing it to provision its own application platform.
  • External cloud: An IT capability offered as a service to a business that is not hosted by its own IT organization. An external cloud can be public or private, but must be implemented by a third party.
Architecture Diagram (Tiered Architecture)

In addition to the definitions, architectural diagrams can further clarify cloud computing as well as classify vendor offerings.

Figure 1: Cloud Computing Tiered Architecture

The “stair-step” layering of cloud service tiers serves two purposes. First, the stair-step effect illustrates that IT organizations are not restricted to engage the cloud through the SaaS layer. IT organizations will interface cloud computing at multiple layers, utilizing either a turnkey SaaS solution or other cloud layers to complete an internally created solution. For example, an IT organization may utilize a hardware infrastructure as a service (HIaaS) provider to run a custom built application and utilize a SaaS provider, such as Salesforce.com, for customer relationship management (CRM).

The second purpose of the layered cloud services model is to illustrate that cloud services can build on one another. For example, IBM Websphere sMash (PaaS) utilizes Amazon’s EC2 (HIaaS). This is not to say that higher-tier cloud services must be built on lower tiers. For example, Microsoft’s first version of Azure is not built on another cloud service. However, the potential for cloud layering exists, and IT organizations need to be aware when a cloud service is layered on another service.

Software as a Service(SaaS)

Software as a service (SaaS) is when a vendor designs the application and hosts it so that users can access the application through a web browser or a rich Internet application (RIA) mechanism such as Adobe Air or Microsoft Silverlight. In the early days of SaaS (i.e., the mid-1990’s), vendors who delivered applications via browsers were called application service providers (ASPs). Over time, the accepted term has changed to SaaS. Because the vendor designs, hosts, and supports the application, SaaS differs from hosting, which is when the enterprise buys the software license from a vendor and then hires a third party to run the application.

The markets served by SaaS have evolved over time. In the 1990’s, SaaS solutions often targeted departmental needs (e.g., human resources and web analytics) or a non-core service (e.g., web conferencing). That way, if the SaaS solution went down, the entire company did not grind to a halt. Today, SaaS solutions are available for a much wider swath of needs, including business intelligence, document sharing, e-mail, office productivity suites, sales force automation (SFA), web analytics, and web conferencing. A relatively recent change in the market is that major vendors such as Cisco Systems, Google, IBM, Microsoft, and Oracle now offer SaaS solutions. In the past, SaaS was the province of startups.

A sampling of SaaS application segments and affiliated vendors:

• Document sharing: Adobe, IBM Lotus, Google, and Microsoft
• E-mail: Cisco, Google, Microsoft, and Yahoo!
• Office productivity suites: Google, ThinkFree, and Zoho
• Sales force and customer management: NetSuite, Oracle, and Salesforce.com
• Web analytics: Coremetrics, Omniture, and WebTrends
• Web conferencing: Cisco Systems, IBM Lotus, and Microsoft

Platform as a Service(PaaS)

A level below SaaS, platform as a service (PaaS) is an externally managed application platform for building and operating applications and services. Like any application platform, a PaaS environment supplies development and runtime frameworks to support presentation, business logic, data access, and communication capabilities. The PaaS environment must also supply supporting infrastructure capabilities, such as authentication, authorization, session management, transaction integrity, reliability, availability, and scalability.

Also, a PaaS environment typically provides development tools for working with the supplied frameworks. Applications and services developed using these frameworks may be expressed as executable code, scripts, or metadata. In some cases, the development tools are hosted by the PaaS provider (particularly when applications are expressed as scripts or metadata). In other cases, the development tools are supplied as downloadable integrated development environments (IDEs) that can be integrated with the organization’s traditional software development lifecycle (SDLC) infrastructure.

We can place the PaaS vendors into five categories:

  • Java application platform vendors: Java application vendors package their traditional application platform middleware for delivery on Amazon EC2. Examples: IBM, Red Hat, Oracle, and SpringSource.
  • Microsoft: Microsoft is collecting a set of software infrastructure services such as .NET Services and Structured Query Language (SQL) Services, running in elastic operating environment entitled Windows Azure. The combined platform is called Windows Azure Platform Services.
  • Emerging proprietary contenders: Emerging platform contenders provide rapid application development capabilities through creation and delivery of new proprietary development and runtime environments. Examples: Salesforce.com and Google App Engine
  • Niche vendors: Niche framework vendors provide specialized application platforms as services. Examples: GigaSpaces Technologies (extreme transactions), Appian Anywhere (business process management [BPM]), and Ning (social networking)
  • Startup vendors: Startup PaaS vendors are platform vendors who only offer the platform online. Examples: LongJump and Bungee Labs
Software Infrastructure as a Service(SIaaS)

Software infrastructure as a service (SIaaS) is a stand-alone cloud service that provides a specific application support capability, but not the entire application software platform service (otherwise, it would be PaaS). For example, Microsoft SQL Data Services is a SIaaS offering. Although SQL Data Services is included in Azure, it is also available as a stand-alone infrastructure service.

The intended consumers of SIaaS offerings are software developers who want to create an application that does not have dependencies on internal infrastructure components, which can be too expensive to license, slow to deploy, and complex to maintain and support. Thus, SIaaS is not a user-consumable cloud service and, by definition, must be subsumed by applications or higher-tier cloud services (e.g., SaaS or PaaS). Other examples of SIaaS include:

  • Data management services: Amazon Simple DB and Microsoft SQL Data Services
  • Messaging services: Amazon Simple Queue Service
  • Integration services: Cast Iron Systems and Workday
  • Content distribution: Akamai and Amazon CloudFront
Hardware Infrastructure as a Service(HIaaS)

Hardware infrastructure as a service (HIaaS) is a virtual or physical hardware resource offered as a service. HIaaS can be implemented in many ways, but most implementations utilize server virtualization as an underlying technology to increase infrastructure utilization (important to HIaaS providers), workload mobility, and provisioning. Many HIaaS providers also use grid and clustering technology to increase scalability and availability.

HIaaS vendors are divided into two categories: the “enablers” and the “providers.” HIaaS enablers are software vendors who develop virtualization software that is used to create hardware infrastructure services. Enabler examples include VMware (vCloud), Citrix Systems (Citrix Cloud Center [C3]), and 3Tera (AppLogic).

HIaaS providers are cloud vendors who utilize an enabling vendor’s technology to create a service. Provider examples include Amazon (EC2 and S3) and Rackspace. AT&T, T-Mobile, and CDW are examples of hosting providers that are planning to provide HIaaS using VMware’s vCloud product.

Cloud Usage Models:

The cloud tiered architecture model provides a basis to discuss cloud computing usage models and to define additional terms:

  • Service provider: The organization providing the cloud service. Also known as “cloud service provider” or “cloud provider.” Service providers may offer a cloud service to a select group (private cloud) or to the general public (public cloud).
  • Service consumer: The person or organization that is using the cloud service. The consumer may access a public or private cloud using the public Internet or a private network.
  • Service procurer: The person or organization obtaining the service on behalf of the consumer.
Challenges and Barriers to Adoption:

Although the cloud presents tremendous opportunity and value for organizations, the
usual IT requirements (security, integration, and so forth) still apply. In addition, some
new issues come about because of the multi-tenant nature (information from multiple
companies may reside on the same physical hardware) of cloud computing, the merger
of applications and data, and the fact that a company’s workloads might reside outside
of their physical on-premise datacenter. This section examines five main challenges
that cloud computing must address in order to deliver on its promise.

Security

Many organizations are uncomfortable with the idea of storing their data and applications on systems they do not control. Migrating workloads to a shared infrastructure increases the potential for unauthorized access and exposure. Consistency around authentication, identity management, compliance, and accesstechnologies will become increasingly important. To reassure their customers, cloud providers must offer a high degree of transparency into their operations.

Data and Application Interoperability

It is important that both data and applications systems expose standard interfaces. Organizations will want the flexibility to create new solutions enabled by data and applications that interoperate with each other regardless of where they reside (public clouds, private clouds that reside within an organization’s firewall, traditional IT environments or some combination). Cloud providers need to support interoperability standards so that organizations can combine any cloud provider’s capabilities into their solutions.

Data and Application Portability

Without standards, the ability to bring systems back in-house or choose another cloud provider will be limited by proprietary interfaces. Once an organization builds or ports a system to use a cloud provider’s offerings, bringing that system back in-house will be difficult and expensive.

Governance and Management

As IT departments introduce cloud solutions in the context of their traditional datacenter, new challenges arise. Standardized mechanisms for dealing with lifecycle management, licensing, and chargeback for shared cloud infrastructure are just some of the management and governance issues cloud providers must work together to resolve.

Metering and Monitoring

Business leaders will want to use multiple cloud providers in their IT solutions and will need to monitor system performance across these solutions. Providers must supply consistent formats to monitor cloud applications and service performance and make them compatible with existing monitoring systems. It is clear that the opportunity for those who effectively utilize cloud computing in their organizations is great. However, these opportunities are not without risks and barriers. It is our belief that the value of cloud computing can be fully realized only when cloud
providers ensure that the cloud is open.

Resources in the cloud:

Virtualization. The notion of cloud computing as an application of "virtualization" is true but also misleading. Most people associate virtualization with the notion of server virtualization and virtual machines, and while these are components cloud computing, the cloud computing concept requires a much broader virtualization view. A cloud computing complex appears as a single abstract resource that can support any application or application component, and when the application is needed (because someone runs it on the virtual computer representing the cloud) it is assigned specific resources from the pool available. In cloud computing, the most significant question is the flexibility of this "resource brokerage" process, because constraints on how resources are assigned have a major effect on overall resource utilization and thus the benefit that cloud computing can offer.

Storage resources. The most significant question in efficiency and resource utilization is likely to be the location of the database supporting the application. If storage and servers are to be allocated separately for optimum utilization, rather than as a unit, storage network performance is critical. This means that it is difficult to allocate storage resources across a wide area network (WAN) connection, and some cloud computing software provides a distributed cloud-optimized database management system (DBMS) to facilitate data virtualization and improve performance overall. Most cloud computing resource brokerage techniques would locate the application server and application storage in the same data center, where storage area network (SAN) connections can be used. This technique is also the norm for private clouds, where enterprises tend to have several cloud data centers with server farms and large SANs. This is why data center networking and SANs are critical parts of nearly all cloud computing architecture -- both on the vendor and the user side. In private cloud applications, these issues are controllable.

Databases. Some public cloud applications may involve "data crunching" of databases. Depending on the size of the database and the percentage of items being accessed by the cloud application, it may be smart to load the needed data onto the cloud in one step and then access it there. The process of uploading data to the cloud is a batch function, but the process of accessing data is interactive, and storage delays accumulate with the number of database accesses performed.

Justifying cloud computing: Outsourcing resources:

Meeting variable IT demand. The near-term impetus for cloud computing services comes from the variation of IT demand according to business cycles and in response to unexpected events. In-house computing resources are normally maintained at a level sufficient to ensure that the IT needs of line departments can be met. The level of resources needed to meet those needs depends on a combination of the total resource requirements, the extent these needs vary in an uncontrolled way because of short-term project demands, and the speed with which new resources can be added. Often the variability of demand forces enterprises to create an oversupply of IT resources to carry through peak load periods. These periods can be periodic (quarterly earnings cycles) or episodic, and in many cases, some of the applications could be outsourced to cloud computing services.

Reducing in-house capacity. It's fairly easy to see whether cloud computing services can reduce the cost of sustaining capacity reserves against peak requirements; an audit of the level of utilization of critical resources over time will normally show the range of variability. That lets organizations estimate the amount of cloud services needed to provide reserve capacity and the cost of those resources. This can be compared with the cost of sustaining excess resource capacity in-house. Generally, the more variable the demand on IT, the more savings can be generated by offloading peak demand into the cloud. But it's also true that large enterprises that achieve good resource economies internally are likely to save less than smaller ones.

Creating operational efficiencies.Operations efficiencies for cloud computing are based on the presumption of support economy of scale, meaning that a support team managing a large cloud data center or a series of data centers is more efficient than one that manages a smaller set of resources. The corollary to this is that larger enterprises are likely to achieve good enough economies of scale with private resources, particularly private cloud computing, to reduce this benefit. That means cloud computing services are most likely to be economical for smaller organizations.

The History of Windows Terminal Services

The Microsoft Windows Server 2003 product line includes Terminal Services, an optional extension of the operating system. It allows end-user applications or several Windows desktops to be used on different clients connected via a network. Applications are executed and data processed exclusively on the server.

Which server types support Terminal Services? For application servers, Terminal Services is provided with the Standard Server, Enterprise Server, and Datacenter Server. The following table lists important features of various server types. It also includes functions such as Remote Desktop to transfer the graphical user interface (GUI) to a remote computer for administration purposes, as well as the session directory to manage user sessions in server environments with capacity allocation mechanisms.

Design Objectives

The primary design objective of Terminal Services was the display of many kinds of Microsoft Windows–based applications on multiple hardware platforms. To function properly, the applications must be able to run as is on Windows Server 2003 with Terminal Services enabled for application servers. By centralizing applications, the technology significantly reduces operating costs, especially in large corporate environments.

Moreover, Terminal Services under Windows Server 2003 provides a powerful option for distributing and updating software. It can replace or augment the Microsoft Systems Management Server and extends Windows capabilities, especially in large companies.

One secondary benefit of Terminal Services is the ability to eliminate so-called dumb terminals that are still in use at many companies. Windows Server 2003 in combination with Terminal Services opens up a migration path from a host environment to a more up-to-date environment.

In principle, a terminal server is a computer on which several users can work simultaneously while their screens can be displayed remotely. But is the platform a server or a client? The answer, as described in this book, is: An application server for several simultaneous users, who are logged on interactively to a single machine, is both a server and a client, depending on one’s point of view.

Click To expand
Figure 1-1: The terminal server multiple-user concept. A single server behaves like multiple Windows XP workstations whose output is redirected to multiple external devices.

The Development of Terminal Services

The Windows environment was developed in the 1980s to run on MS-DOS. The GUI was first introduced in November 1985. After the OS/2 initiative in cooperation with IBM to develop a successor to MS-DOS, Microsoft decided to work on a more progressive operating system that would support both Intel and other CPUs. The idea was to write the new operating system in a more sophisticated programming language (such as C) so that it could be ported more easily. In 1988, Microsoft hired David Cutler, the chief developer of Digital Equipment Corporation’s VMS, to manage the Windows New Technology project.

In the early 1990s, Microsoft released Microsoft Windows 3.0. This gained a large user base and therefore played a key role in the development of the new Microsoft Windows NT system. The design work for Windows NT took two years; three more were required to write the related program code.

The first version of Windows NT was launched in May 1993. It was based on its smaller but very successful sibling, Windows 3.1. Windows and Windows NT had the same GUI. However, Windows NT was not based on MS-DOS; it was a completely new 32-bit operating system. From the very first version, Windows NT could run both text-based OS/2 and POSIX applications as well as the older DOS and Windows-based applications.

Over time, both Windows NT and Windows 3.1 continued to be developed. From the start, Windows NT was considered the more stable system, especially for professional environments. As companies introduced personal computers, Windows NT became the market leader due to its stability in spite of increasing hardware requirements.

When Windows NT versions 3.5 and 3.51 hit the market, Microsoft was not very interested in equipping its high-end operating system with multiple-user features like UNIX. Therefore, in 1994, Microsoft granted Citrix access to the Windows NT source code to develop and market a multiple-user expansion. The expansion was called WinFrame and was quite successful in several companies a few years ago.

Ed Iacobucci, the founder of Citrix, had already developed the WinFrame concepts. From 1978 to 1989, he worked on developing OS/2 at IBM. His vision that different computers be able to access OS/2 servers through a network led to the idea of a multiple-user system. IBM, however, did not recognize the potential such an environment held. Inspired by this concept Ed Iacobucci left IBM in 1989 to found Citrix. The first Citrix products were still based on OS/2 and enjoyed only modest commercial success. That changed only when the Windows NT source code was used.

WinFrame’s great success and the increasing significance of thin client/server concepts led Microsoft on May 12, 1997, to license Citrix’ multiple-user expansion, MultiWin for Windows NT. Part of the license agreement stipulated that Citrix would not launch a WinFrame version based on Windows NT 4.0. Microsoft provided this release on June 16, 1998, with the launch of Windows NT 4.0 Server, Terminal Server Edition (code name “Hydra”).


Note?

Windows NT 4.0 Server, Terminal Server Edition, has been available only as an OEM version since August 2000. Due to the continued wide distribution of this platform, Microsoft made available the “NT 4 TSE Security Roll-Up Package” in April 2002.

One problem with Windows NT 4.0 was that the Terminal Server Edition was built on a modified version of the system kernel that required adapted service packs and hot fixes. This was addressed during the Windows 2000 design phase, when all needed modifications for multiple-user operation were integrated in the kernel from the start and corresponding system service and driver functions were realized— Windows 2000 Terminal Services. The single code base, designed to avert the obvious mistakes in UNIX and its many derivates, prevented a fragmentation of the Windows 2000 server market.

Unlike its predecessor, Windows 2000 did not require the purchase of an independent operating system for the multiple-user option. You simply enabled an integrated component. There was a single common system kernel for Windows 2000, regardless of the number of simultaneous users. The common kernel, of course, led to a standardization of service packs and hot fixes. All other system expansions or improvements immediately became available for terminal servers, too.

Compared to Windows NT 4.0, Terminal Server Edition, the new Windows 2000 Terminal Services included the option of using the clients’ printer and clipboards from the server (printer redirection and clipboard redirection). Additionally, it was now possible to monitor sessions remotely; that is, one user could see another user’s session and, with the corresponding permissions, could even interact with it.

To improve the integration of clients under Windows 2000, the Remote Desktop Protocol (RDP) protocol was optimized, a bitmap-caching option for raster images was introduced (bitmap caching), and access to client devices via virtual channels was created. A corresponding application programming interface (API) enabled the specific programming for multiple-user servers.

Before Windows Server 2003, Windows XP was launched as the new client platform on October 22, 2001. For the first time, client and server lines of the Windows NT code base were made available at different times. The standard installation of Windows XP also uses terminal server technologies for a number of tasks, such as the following:

  • Terminal server client Available in Windows XP Home Edition and Windows XP Professional. The new RDP client allows access to servers with activated Terminal Services.

  • Fast user switching Available in Windows XP Home Edition and Windows XP Professional. Users can run applications in the background while other users log on and work on the same Windows XP machine. Available in the Professional version only if the computer is not a member of a domain.

  • Remote assistance Available in Windows XP Home Edition and Windows XP Professional. A user can ask an expert for help and the expert can assume control of the user’s screen. The objective is one-on-one support, generally in help desk environments. This technology allows shared access to the user’s console. Access is configured through group policy. This feature is available at the Help and the Support Center Windows accessed through the Start menu by choosing the Help and Support option.

  • Remote desktop Only available in Windows XP Professional. The terminal server technology is available on the client platform. A user can operate a system under Windows XP Professional from another computer. The default setting allows only administrators to use this function. Additional users can be added through the integrated Remote Desktop User Group via the Control Panel.

During the installation of Windows Server 2003, Terminal Services is automatically set to Remote Desktop mode. To use Terminal Services, however, it must be activated via Workstation | Properties | Remote or the group policies. This allows the administrator easier access to the server over the network. Under Windows 2000, this mode was called Remote Administration, even though the basic function remains the same.

If Terminal Services is used in application server mode, it needs to be configured accordingly. Compared to Windows 2000 features, several changes and improvements were made.

  • Administrative tools Improved tools for Terminal Services administration.

  • Printing Improved printing via terminal servers. Local printers can now be integrated and reconnected automatically.

  • Redirecting drives and file systems Users can now see and use the local drive of their client during terminal server sessions.

  • Redirecting audio streams The audio output of a terminal server session can be redirected to the client platform.

  • Redirecting the clipboard Users can copy and paste between local and server-based applications.

  • Group policies Almost all Terminal Services features can now be managed with the help of the group policies.

  • WMI provider Most Terminal Services configurations can be executed by means of WMI (Windows Management Instrumentation) scripting.

  • Access rights Expansion of security features through new user groups and permission allocation.

  • Session directory Redirection of a user logon to an existing disconnected connection within a farm of terminal servers. This requires the installation of a corresponding service.

The RDP protocol also was considerably reworked and improved during the development of both Windows XP and Windows Server 2003.

Server-Based Computing:

The terminal server concept does not follow the usual approach to operating systems at Microsoft. It does not fit the notion of a “rich client” with local applications integrated into a network of high-performance servers that use a massive amount of resources. Neither does a terminal server match the typical environment of .NET- connected applications with components running on different platforms. On the other hand, the terminal server does support the concept of “server-based computing.” It is based on a centralized, well-equipped server—which we could call the host—which many users log on to simultaneously to work interactively with the applications installed on that server. All the application components run exclusively on the server. The server is accessed via the network from low-maintenance clients equipped with basic functions only. These clients are also called terminals, which is how the term terminal server came about. The clients merely provide visual access to applications and a means to interact with them by keyboard and mouse. Depending on the clients’ characteristics, additional input and output devices can be added.

Click To expand
Figure 1-2: Schematic representation of the transfer of screen content from a Windows Server 2003 terminal server to a thin client over the network.If this brings the world of mainframe computers to mind, you are not far from the mark. The terminal-host concept is not new and is now enjoying a revival in the terminal server. The basic idea was simply set on a new, state-of-the-art foundation, thus enabling access to modern, graphics-oriented applications without the need for modifications.

Different Client-Server Architectures

Even if terms such as terminal and host are often associated with it, the terminal server remains a special variant of the pure client/server environment. In a client/server architecture, certain resource-intensive tasks such as user authentication, printing, e-mail administration, database operations, or applications execution are limited to the server (the supplier). The clients (the customers) are linked to the server and provide a conduit for requesting services from the server. As a result, network traffic is usually quite low compared to other types of architectures. However, the server often demands high-end processing power, hard-drive capacity, main memory, and data throughput.

There are different levels of client/server options. They vary in their handling of the distributed application and data management, which in turn affects the efficiency of the server or client.

Click To expand
Figure 1-3: Different client/server options.
  • Remote presentation Remote presentation corresponds to a thin client having little native intelligence that depends directly on its server. The server is responsible for running all applications and managing data, whereas the client handles display, keyboard and mouse connections. X terminals, “green terminals” on mainframe computers, or terminal server clients are examples of this type of client. You could also include a Web browser that displays HTML pages in this category because all the “intelligence” needed to create these pages resides on the Web server.

  • Distributed application The concept of a distributed application is realized in many network systems where the client needs a certain amount of native intelligence to optimize the processing of complex tasks. For instance, database requests are created on the client to be run on a database server. Seldom- used or computation-bound components of a client application can be transferred to a server. The latter option exploits the strengths and available resources of both the client and the server. However, due to their high degree of distribution, these applications often require a major human effort to develop and maintain, such as SQL databases or Siebel systems. A Web browser also falls into this category if, in addition to HTML pages, it runs local scripts that transfer specific application logic to the client. These scripts can be loaded with the HTML data stream and are usually based on Visual Basic Script or JScript (or JavaScript).

  • Remote data management Remote data management is used by many companies that have a PC infrastructure: all the application programs are found locally on the client, and only data is saved in a central location. This permits simple strategies for backing up and managing user data, thus requiring a less complex server structure. One clear disadvantage, however, is the level of management required to install and administer applications. Experienced users and developers favor this model because they in large part retain control over their clients.

  • Distributed data management The distributed data management model is every central administrator’s nightmare. Not only are applications stored on the client, but also some data as well, which makes it very difficult to manage and secure. Even though the user retains most control over the client computer, he or she would be at a loss in the event of a hardware or software error. The loss of a local hard drive could cause damage to the company due to un-recoverable data. The connected servers are only used for occasional data archiving and perhaps accessing e-mail or the Internet.

Terminal Servers in Client/Server Environments

A terminal server requires the integration of thin client software, thin clients, or terminals. It corresponds to the first of the client/server options (remote presentation) mentioned earlier and therefore has the advantage of central administration. The other client/server options can be associated with different popular computing concepts as well, which helps classify them. For example, a PC in a local area network (LAN) falls under remote data management, whereas a classic client/server solution is a distributed application.

Click To expand
Figure 1-4: The different computing concepts used in companies.

Nevertheless, a bi-level client/server model is inadequate and falls short of reality. Most of the time real environments have several layers. A client accesses an application or the Web server on the intranet, which in turn accesses a file server, a print server, a database server, or an e-mail server. In this way, the multilevel model meets the not-so-new requirement for complex application programs: the separation of presentation/interaction, program logic, and data management.

The real challenge for system administrators lies in providing and controlling such a complex environment. The reason is that often several client/server models are combined in corporate terminal server environments. For instance, Microsoft Outlook, a client application, accesses an Exchange server which is a distributed application. If, however, Outlook is not installed directly on the client PC but on the terminal server, this model would resemble a remote presentation. The processing logic for the Exchange data in Outlook is separate from its display on the terminal server client. Even though it seems awkward at first, this method has definite advantages over other models.

Windows Server 2003 and Terminal Services:

Terminal Services is available for all members of the latest Windows server family and can be activated at any time. It can be accessed on the Web Server only in remote desktop mode, so it is not a terminal server in the usual sense. The terminal server component provides the graphical user interface to a remote device via the LAN or an Internet connection.

The Different Terminal Server Modes

In Windows Server 2003, Terminal Services is available in two varieties: application server mode, which must be installed as a component, or remote desktop mode, which is used for remote administration of the server and requires special permissions to access.


Figure 1-5: Ability to activate remote desktop connections via My Computer | Properties | Remote.

Application Server

A terminal server running in application server mode is an efficient and reliable way to furnish Windows-based applications on a network. This terminal server represents a central installation point for applications that are accessed simultaneously by several users from their respective clients.


Note?

If applications are already installed on Windows Server 2003 and Terminal Services is later activated in application server mode, some of the applications might not work properly. A multiple-user environment has special configuration requirements.

Terminal servers in application server mode also allow Windows-based applications to run on clients that are not running the Windows operating system. However, additional third-party (for example, Citrix) products must be used to realize this option.





History of Virtualization

Virtualization was first developed in the 1960s to partition large, mainframe hardware for better hardware utilization. Today, computers based on x86 architecture are faced with the same problems of rigidity and underutilization that mainframes faced in the 1960s. VMware invented virtualization for the x86 platform in the 1990s to address underutilization and other issues, overcoming many challenges in the process. Today, VMware is the global leader in x86 virtualization, with over 170,000 customers, including 100% of the Fortune 100.

In the Beginning: Mainframe Virtualization

Virtualization was first implemented more than 30 years ago by IBM as a way to logically partition mainframe computers into separate virtual machines. These partitions allowed mainframes to “multitask”: run multiple applications and processes at the same time. Since mainframes were expensive resources at the time, they were designed for partitioning as a way to fully leverage the investment.

The Need for x86 Virtualization

Virtualization was effectively abandoned during the 1980s and 1990s when client-server applications and inexpensive x86 servers and desktops led to distributed computing. The broad adoption of Windows and the emergence of Linux as server operating systems in the 1990s established x86 servers as the industry standard. The growth in x86 server and desktop deployments led to new IT infrastructure and operational challenges. These challenges include:

  • Low Infrastructure Utilization. Typical x86 server deployments achieve an average utilization of only 10% to 15% of total capacity, according to International Data Corporation (IDC), a market research firm. Organizations typically run one application per server to avoid the risk of vulnerabilities in one application affecting the availability of another application on the same server.
  • Increasing Physical Infrastructure Costs. The operational costs to support growing physical infrastructure have steadily increased. Most computing infrastructure must remain operational at all times, resulting in power consumption, cooling and facilities costs that do not vary with utilization levels.
  • Increasing IT Management Costs. As computing environments become more complex, the level of specialized education and experience required for infrastructure management personnel and the associated costs of such personnel have increased. Organizations spend disproportionate time and resources on manual tasks associated with server maintenance, and thus require more personnel to complete these tasks.
  • Insufficient Failover and Disaster Protection. Organizations are increasingly affected by the downtime of critical server applications and inaccessibility of critical end user desktops. The threat of security attacks, natural disasters, health pandemics and terrorism has elevated the importance of business continuity planning for both desktops and servers.
  • High Maintenance end-user desktops. Managing and securing enterprise desktops present numerous challenges. Controlling a distributed desktop environment and enforcing management, access and security policies without impairing users’ ability to work effectively is complex and expensive. Numerous patches and upgrades must be continually applied to desktop environments to eliminate security vulnerabilities.

The VMware Solution: Full Virtualization of x86 Hardware

In 1999, VMware introduced virtualization to x86 systems to address many of these challenges and transform x86 systems into a general purpose, shared hardware infrastructure that offers full isolation, mobility and operating system choice for application environments.

Challenges & Obstacles to x86 Virtualization

Unlike mainframes, x86 machines were not designed to support full virtualization, and VMware had to overcome formidable challenges to create virtual machines out of x86 computers.

The basic function of most CPUs, both in mainframes and in PCs, is to execute a sequence of stored instructions (ie, a software program). In x86 processors, there are 17 specific instructions that create problems when virtualized, causing the operating system to display a warning, terminate the application, or simply crash altogether. As a result, these 17 instructions were a significant obstacle to the initial implementation of virtualization on x86 computers.

To handle the problematic instructions in the x86 architecture, VMware developed an adaptive virtualization technique that “traps” these instructions as they are generated and converts them into safe instructions that can be virtualized, while allowing all other instructions to be executed without intervention. The result is a high-performance virtual machine that matches the host hardware and maintains total software compatibility. VMware pioneered this technique and is today the undisputed leader in virtualization technology.

A Look at Some VMware Infrastructure Architectural Advantage

Below summary explain the elements of the ESX architecture that I believe set it apart from Hyper-V and Xen and the reasons behind some of Vmware design decisions. I thought it would be interesting material for the readers of this blog, so take a look and tell us what you think...

VMware Infrastructure - Architecture Advantages

VMware Infrastructure is a full data center infrastructure virtualization suite that provides comprehensive virtualization, management, resource optimization, application availability and operational automation capabilities in a fully integrated offering. VMware Infrastructure virtualizes the entire IT infrastructure, including servers, storage and networks and aggregates these heterogeneous resources into a simple and uniform set of computing resources in the virtual environment. With VMware Infrastructure, IT organizations can manage resources as a shared utility and dynamically provision them to different business units and projects without worrying about the underlying hardware differences and limitations.

Complete Virtual Infrastructure

VMware_VI_stack_slide_23Jun2008

As shown in the preceding figure, VMware Infrastructure can be represented in three layers:

1. The base layer or virtualization platform is VMware ESX – the highest performing, production-proven hypervisor on the market. Tens of thousands of customers deploy VMware ESX (over 85 percent in production environments) for a wide variety of workloads.

2. VMware Infrastructure’s support for pooling x86 CPU, memory, network and storage resources is the key to its advanced data center platform features. VMware Infrastructure resource pools and clusters aggregate physical resources and present them uniformly to virtual machines for dynamic load balancing, high availability and mobility of virtual machines between different physical hardware with no disruption or downtime.

3. Above the virtual infrastructure layers sits end-to-end application and infrastructure management from VMware that automates specific IT processes, ensures disaster recovery, supports virtual desktops and manages the entire software lifecycle.

VMware ESXi – The Most Advanced Hypervisor

VMware ESXi 3.5 is the latest generation of the bare-metal x86 hypervisor that VMware pioneered and introduced over seven years ago. The industry’s thinnest hypervisor, ESXi is built on the same technology as VMware ESX, so it is powerful enough to run even the most resource-intensive applications; however, it is only 32 MB in size and runs independently of a general-purpose OS.

The following table shows just how much smaller the VMware EXSi installed footprint is compared to other hypervisors. These are results from installing each product and measuring disk space consumed, less memory swap files.

Comparative Hypervisor Sizes (including management OS)

VMware ESX 3.5 2GB
VMware ESXi 32MB
Microsoft Hyper-V with Windows Server 2008 10GB
Microsoft Hyper-V with Windows Server Core 2.6GB
Citrix XenServer v4 1.8GB

As the numbers show, ESXi has a far smaller footprint than competing hypervisors from vendors that like to label ESX as "monolithic."

The ESXi architecture contrasts sharply with the designs of Microsoft Hyper-V and Xen, which both rely on a general-purpose management OS – Windows Server 2008 for Hyper-V and Linux for Xen – that handles all management and I/O for the virtual machines.

Indirect_arch Indirect_arch

The VMware ESX direct driver architecture avoids reliance on a heavyweight Windows or Linux management partition OS.

Advantages of the ESX Direct Driver Architecture

Vmware's competition negatively portrays VMware ESX Server as a “monolithic” hypervisor, but vmware's experience and testing proves it to be the best design.

The architecture for Citrix XenServer and Microsoft Hyper-V puts standard device drivers in their management partitions. Those vendors claim this structure simplifies their designs compared to the VMware architecture, which locates device drivers in the hypervisor. However, because Xen and Hyper-V virtual machine operations rely on the management partition as well as the hypervisor, any crash or exploit of the management partition affects both the physical machine and all its virtual machines. VMware ESXi has done away with all reliance on a general-purpose management OS, making it far more resistant to typical OS security and reliability issues. Additionally, our seven years of experience with enterprise customers has demonstrated the impressive reliability of our architecture. Many VMware ESX customers have achieved uptimes of more than 1,000 days without reboots.

ESX_uptime

One of Vmware's customers sent them this screenshot showing four years of continuous ESX uptime.

The VMware direct driver model scales better than the indirect driver models in the Xen and Hyper-V hypervisors.

The VMware ESX direct driver model puts certified and hardened I/O drivers directly in the VMware ESX hypervisor. These drivers must pass rigorous testing and optimization steps performed jointly by VMware and the hardware vendors before they are certified for use with VMware ESX. With the drivers in the hypervisor, VMware ESX can provide them with the special treatment, in the form of CPU scheduling and memory resources, that they need to process I/O loads from multiple virtual machines. The Xen and Microsoft architectures rely on routing all virtual machine I/O to generic drivers installed in the Linux or Windows OS in the hypervisor’s management partition. These generic drivers can be overtaxed easily by the activity of multiple virtual machines – exactly the situation a true bare-metal hypervisor, such as ESXi, can avoid. Hyper-V and Xen both use generic drivers that are not optimized for multiple virtual machine workloads.

VMware investigated the indirect driver model, now used by Xen and Hyper-V, in early versions of VMware ESX and quickly found that the direct driver model provides much better scalability and performance as the number of virtual machines on a host increases.

Netperf_scaling

The scalability benefits of the VMware ESX direct driver model became clearly apparent when we tested the I/O throughput of multiple virtual machines compared to XenEnterprise, as shown in the preceding chart from a paper published here. Xen, which uses the indirect driver model, shows a severe I/O bottleneck with just three concurrent virtual machines, while VMware ESX continues to scale I/O throughput as virtual machines are added. Our customers that have compared VMware ESX with the competition regularly confirm this finding. Similar scaling issues are likely with Hyper-V, because it uses the same indirect driver model.

Better Memory Management with VMware ESX

In most virtualization scenarios, system memory is the limiting factor controlling the number of virtual machines that can be consolidated onto a single server. By more intelligently managing virtual machine memory use, VMware ESX can support more virtual machines on the same hardware than any other x86 hypervisor. Of all x86 bare-metal hypervisors, only VMware ESX supports memory overcommit, which allows the memory allocated to the virtual machines to exceed the physical memory installed on the host. VMware ESX supports memory overcommit with minimal performance impact by combining several exclusive technologies.

Memory Page Sharing

Content-based transparent memory page sharing conserves memory across virtual machines with similar guest OSs by seeking out memory pages that are identical across the multiple virtual machines and consolidating them so they are stored only once, and shared. Depending on the similarity of OSs and workloads running on a VMware ESX host, transparent page sharing can typically save anywhere from 5 to 30 percent of the server’s total memory by consolidating identical memory pages.

clip_image008

Transparent Page Sharing.

Memory Ballooning

VMware ESX enables virtual machines to manage their own memory swap prioritization by using memory ballooning to dynamically shift memory from idle virtual machines to active virtual machines. Memory ballooning artificially induces memory pressure within idle virtual machines as needed, forcing them to use their own paging areas and release memory for more active or higher-priority virtual machines.

clip_image010

Memory Ballooning.

VMware ESX handles memory ballooning by using a pre-configured swap file for temporary storage if the memory demands from virtual machines exceed the availability of physical RAM on the host server. Memory overcommitment enables great flexibility in sharing physical memory across many virtual machines, so that a subset can benefit from increased allocations of memory, when needed.

Memory Overcommit Provides Lowest Cost of Ownership

The result of this memory conservation technology in VMware ESX is that most customers can easily operate at a 2:1 memory overcommit ratio with negligible performance impact. Vmware's customers commonly achieve much higher ratios. Compared to Xen and Microsoft Hyper-V, which do not permit memory overcommit, VMware Infrastructure customers can typically run twice as many virtual machines on a physical host, greatly reducing their cost of ownership.

Cost_per_VM_chart

TCO Benefits of VMware Infrastructure 3 and its better memory management.

The table above illustrates how a conservative 2:1 memory overcommit ratio results in a lower TCO for even our most feature-complete VMware Infrastructure 3 Enterprise edition, compared to less functional Microsoft and Xen offerings.

Storage Management Made Easy with VMFS

Virtual machines are completely encapsulated in virtual disk files that are either stored locally on the VMware ESX host or centrally managed using shared SAN, NAS or iSCSI storage. Shared storage allows virtual machines to be migrated easily across pools of hosts, and VMware Infrastructure 3 simplifies use and management of shared storage with the Virtual Machine File System (VMFS.) With VMFS, a resource pool of multiple VMware ESX servers can concurrently access the same files to boot and run virtual machines, effectively virtualizing the shared storage and greatly simplifying its management.

VMFS_diagram

VMware VMFS supports and virtualizes shared storage.

While conventional file systems allow only one server to have read-write access to the file system at a given time, VMware VMFS is a high-performance cluster file system that allows concurrent read-write access by multiple VMware ESX servers to the same virtual machine storage. VMFS provides the first commercial implementation of a distributed journaling file system for shared access and rapid recovery. VMFS provides on-disk locking to ensure that multiple servers do not power on a virtual machine at the same time. Should a server fail, the on-disk lock for each virtual machine is released so that virtual machines can be restarted on other physical servers.

The VMFS cluster file system enables innovative and unique virtualization-based distributed services. These services include live migration of running virtual machines from one physical server to another, automatic restart of failed virtual machines on a different physical server, and dynamic load balancing of virtual machines across different clustered host servers. As all virtual machines see their storage as local attached SCSI disks, no changes are necessary to virtual machine storage configurations when they are migrated. For cases when direct access to storage by VMs is needed, VMFS raw device mappings give VMware ESX virtual machines the flexibility to use physical storage locations (LUNs) on storage networks for compatibility with array-based services like mirroring and replication.

Products like Xen and Microsoft Hyper-V lack an integrated cluster file system. As a result, storage provisioning is much more complex. For example, to enable independent migration and failover of virtual machines with Microsoft Hyper-V, one storage LUN must be dedicated to each virtual machine. That quickly becomes a storage administration nightmare when new VMs are provisioned. VMware Infrastructure 3 and VMFS enable the storage of multiple virtual machines on a single LUN while preserving the ability to independently migrate or failover any VM.

VMFS gives VMware Infrastructure 3 a distributed systems orientation that distinguishes it from our competition.

VMware Infrastructure 3 is the first virtualization platform that supports pooling the resources of multiple servers to offer a new array of capabilities. The revolutionary DRS and HA services rely on VMFS features to aggregate shared storage, along with the processing and network capacity of multiple hosts, into a single pool or cluster upon which virtual machines are provisioned. VMFS allows multiple hosts to share access to the virtual disk files of a virtual machine for quick VMotion migration and rapid restart while managing distributed access to prevent possible corruption. With Hyper-V, Microsoft is just now rolling out a first-generation hypervisors with a single node orientation. It lacks distributed system features like true resource pooling, and it relies on conventional clustering for virtual machine mobility and failover.

VirtualCenter – Complete Virtual Infrastructure Management

A VirtualCenter Management Server can centrally manage hundreds of VMware ESX hosts and thousands of virtual machines, delivering operational automation, resource optimization and high availability to IT environments. VirtualCenter provides a single Windows management client for all tasks called the Virtual Infrastructure client. With VirtualCenter, administrators can provision, configure, start, stop, delete, relocate and remotely access virtual machines consoles. The VirtualCenter client is also available in a web browser implementation for access from any networked device. The browser version of the client makes providing a user with access to a virtual machine as easy as sending a bookmark URL.

VC_diagram

VMware VirtualCenter centrally manages the entire virtual data center.

VirtualCenter delivers the highest levels of simplicity, efficiency, security and reliability required to manage a virtualized IT environment of any size, with key features including:

  • Centralized management
  • Performance monitoring
  • Operational automation
  • Clustering and pooling of physical server resources
  • Rapid provisioning
  • Secure access control
  • Full SDK support for integrations

I'll stop there for now. All the management and automation and VDI services depicted in the top layer of the figure at the beginning of this post further set us apart from the competition. Services like Update Manager, SRM, Lab Manager and VDM offer amazing capabilities, but we'll save that discussion for some upcoming posts.

Friday, January 29, 2010

A brief architecture overview of VMware ESX, XEN and MS Viridian

It is my feeling that there has been a bit of confusion lately around how hypervisors are being positioned by the various vendors. I am specifically referring to the three major technologies that seem to be the most relevant strategically going forward:
  • VMware ESX
  • Microsoft Viridian
  • Xen

VMware ESX is the VMware flagship hypervisor product: it's the basis for the Virtual Infrastructure version 3 framework.

MS Viridian is the next generation hypervisor that Microsoft is going to use in the Longhorn time frame and that is currently being developed. It's basically the successor of Microsoft Virtual Server.

Xen is an opensource hypervisor that is being integrated by a number of players which include RedHat, Suse, XenSource and Virtual Iron.

All these vendors (VMware, Microsoft, RedHat, Suse, XenSource, Virtual Iron) are pitching their own virtualization solutions as being the optimal implementation. I don't want to discuss this in the very details because it would require a pervasive understanding of the very low level technologies required to design these products (which I don't have) but I would rather try to go through a very high level analysis to either demystify or (try to) clarify some of the points. I have in fact had a chance to participate to some events hosted by these various vendors and it appears to me they are using some high-level facts at their own convenience to try to demonstrate their design is better than others'. Which is fair and obvious.

There are three major areas of confusion for us "human beings" trying to determine which approach and which solution makes more sense. These areas are:

  1. The architectural implementation of the hypervisor: this includes discussions like "my hypervisor is thinner than yours" etc etc.
  2. The hardware assists (Intel-VT, AMD-V) dilemma: "my hypervisor uses cpu hardware extensions to do what you do in software so it's faster than yours" or viceversa.
  3. The paravirtualization dilemma: "my hypervisor can support this modified guest hence it's (or it will be) faster than yours" etc etc.

Let's try to dig into all three.

My hypervisor is thinner than yours!

As I said I have been to some vendor presentations of these technologies and all of them tried to outline how their architecture was better than the others. There are many details one would need to discuss but I think that the majority of the people (virtualization customers, potential virtualization customers and virtualization IT professionals) are interested in major details only. There are in fact two main different reference architectures one could depict out of the 3 major platforms (i.e. VMware ESX, MS Viridian and Xen) as you could see from the diagrams below (the charts are taken as-is from public documents).

The first diagram outlines the internal architecture of VMware ESX; the second diagram outlines the internal architecture of the MS Windows Server Virtualization (aka Viridian) hypervisor while the third diagram describes the internals of the Xen architecture. Notice that while this diagram has been taken as-is from a XenSource presentation the internals of Xen do not change whether it's being used by the XenSource package, by the Virtual Iron package, by the RedHat or Suse packages (well the details might vary but the overall internal design doesn't).


As you could see from these pictures VMware ESX implements what it is referred to as the "VMKernel" which is a bundle of hypervisor code along with the device driver modules used to support a given set of hardware. The size of the VMkernel is known to be in the range of some 200.000 lines of code or a few MBytes. On top of that there are a so called VMware Console OS that is in fact a sort of system virtual machine that is used to accomplish most administrative tasks such as providing a shell to access the VMkernel, the http and VirtualCenter services to administer the box. The Console OS is not typically used to support virtual machines workloads as everything is handled by the VMkernel.

On the other hand Viridian and Xen implement a different philosophy where the so called "Parent Partition" and "Dom0" play a different role than the Console OS. The hypervisor implementation in Viridian and Xen is much smaller than that in the ESX implementation and in fact it is in the range of some few thousands lines of code (vs the 200.000 of the VMKernel) or some KBytes (vs the MBytes of the VMKernel). However the Viridian/Xen implementation pretty heavily involves the usage of the Parent Partition / Dom0 as far as device drivers are concerned. In fact they are using these two entities to "proxy" I/O calls from the virtual machines to the physical world. On top of this proxy function of course the Parent Partition and Dom0 also provide higher level management functions similarly to the VMware Console OS. So in my opinion claiming that the Viridian / Xen hypervisor is "thinner" than the VMKernel is partially true since VMware decided (for their own convenience) to put stuff into the VMKernel that on the other solutions did not just evaporated... they are merely called differently and are included in different "locations".

Let me be clear, I am not saying that the ESX architecture is better than the Viridian/Xen architecture or viceversa. I am saying that if you look at the overall picture they have very different internal mechanisms to achieve similar things. Unfortunately Viridian is not yet available so any performance claim needs to be discussed later but as far as we can see there are no "huge" differences between ESX Vs Xen micro-benchmarks at the moment (at least not big enough to say "this architecture works best, period!") so I think it's fair enough to suggest not to bother about these implementation details because different vendors have taken different routes (for their convenience, heritage etc etc) but are likely to provide similar results in terms of performance.

My hypervisor uses cpu hardware extensions to do what you do in software so it's faster than yours (or viceversa)

Another dilemma that is being discussed a lot lately has to do with these new hardware instructions that Intel and AMD have introduced over the last couple of years with their CPU's. Intel calls them Intel-VT while AMD calls them AMD-V (or Pacifica). Essentially what they do is providing a hook for those that develop virtualization software to make the processor appear more "virtualization-aware". Historically the x86 processors has never supported any form of virtualization in the sense that it was a common assumption that a given server (or PC) would have run one and only one Operating System supporting various applications (most likely one per OS given the limited cooperation of applications in the x86 stack). Various techniques have been developed over the past few years to overcome this limitation and VMware pioneered a philosophy called "binary translation" where the hypervisor would trap privileged instructions issued by the guest and re-work them so they play nice in a virtual environment. This allows running an unmodified guest OS within a virtual machine and it is indeed a powerful idea. Xen comes from a different perspective and what has been historically used to overcome this problem was something called "paravirtualization". It essentially means that instead of having the hypervisor "adjust" privileged calls issued by a standard guest OS... the guest gets modified (i.e. paravirtualized) in order to play "natively" nice in a virtual environment. This of course requires a change in the guest OS kernel and it is not by chance that historically paravirtualized Linux guests were the only virtual environments supported by the Xen hypervisor (the Xen community did not, for obvious reasons, have access to the Windows source code so they could not "patch" it). We will return on this paravirtualization concept later.

Intel-VT and AMD-V started to change all this. Now a hypervisor can leverage these new hardware instructions rather than implementing a "complex trap and emulate logic" that is very challenging to develop and tune for optimal performance. So what happened in the last months is that the Xen hypervisor has been modified to take advantage of these new instructions so that you could run unmodified guests on top of Xen (Windows and standard Linux distributions).

So we have now a situation where VMware continues to implement this "binary translation" to support standard operating systems while Xen has provided support for standard operating systems by means of these "hardware assists". We are in the middle of this marketing battle where VMware claims that their software "binary translation" is faster than the hardware assists implementation (i.e. "we have tuned binary translation for more than 10 years while hw assists are an immature technology that has just appeared and might perhaps be convenient to use by those that do not have the knowledge to develop binary translation") while Xen claims the opposite (i.e. "we leverage high-performance native hardware instructions while others are still using their legacy and slower software mechanisms"). The battle is tough and it creates confusion among the community.

The reality is that, based on the latest benchmarks published by the vendors (and biased accordingly of course) there is no much different between the two implementations. They are both right in my opinion. It is true that, ideally, you would run something faster in a native hardware implementation but at the same time it is also true that a 10 years fine tuned "software trick" can be even faster than a version 1 hardware implementation. We are at an inflection point where perhaps VMware still has a little bit of performance advantage and that is the reason for which they are sticking for the moment on their own software implementation but there is no doubt that going forward, as these hardware instructions mature, that is the path to follow. On the other hand it would have made absolutely no sense for Xen (or Viridian) to even think about developing a complex "trap and emulate" trick just for this very limited time frame. We did not touch on Viridian (it's not yet available after all) but their implementation and philosophy is very similar (or will be very similar) to that described hereafter for Xen.

Notice that to complicate things further VMware currently requires Intel-VT to support 64-bit guests. This has nothing to do with the general performance discussion above but it is rather due to the fact that Intel removed some "memory protection logic" using standard x86 instructions and in order to achieve the same result for 64-bit guests VMware requires the usage of some Intel-VT instructions. Again this has nothing to do with implementing hypervisor functions in the software or leveraging the hw ... as a matter of fact you do not need AMD-V to run 64-bit guests on ESX (it is a very Intel peculiar thing).

This is yet another example of different vendors coming from different backgrounds and trying to solve the same problem in different manners. As per the "my hypervisor is thinner than yours" quite frankly I don't see at the moment (June 2007) a technology that prevails over the other by large. As I said perhaps VMware has a little advantage (otherwise it would have been easy for them to use the hardware assists as well) but based on the numbers I see it's not enormous. They will all eventually migrate to leverage these hardware instructions especially with the upcoming releases that introduce more features such as memory virtualization (i.e. AMD Nested Page Tables and Intel Extended Page Tables) but for the moment you need to deal with all their marketing messages.

My hypervisor can support this modified guest hence it's faster than yours

This is the most tricky and complex of all three. That is the case because it can get complex from a technical perspective and also because it is still pretty much up in the air. I have already touched on the concept of paravirtualization above. A paravirtualized guest is basically an OS running in a virtual machine that has been optimized (i.e. the kernel has been optimized) so that it knows it is running in virtual environment.

Let's step back for a second here. I downplayed a little bit this concept in my analysis above (i.e. you cannot run Windows etc etc) but in reality this landscape has been changed by two things:

  1. Since RedHat and Suse integrated Xen in their distribution they have also shipped fully supported paravirtualized kernels (previously the patching was provided by the open source community basically under the form of a kernel hack and obviously this was not very well perceived by many customers that required fully supported stacks)
  2. MS is going to "enlight" (enlightenment is the MS word for paravirtualization) their own operating systems moving forward to be optimized to run on Viridian.

Back to the point there are really two set of thoughts currently in the industry (I warned you it's still up in the air). The first thought is that these new hardware assists hardware (especially in future implementations) has diminished the need of paravirtualizing the guest. These hw implementations will be so efficient and optimized that there will be no need to optimize the guest OS as well and even a standard OS (i.e. non paravirtualized) will perform close to native speed. The other thought is that, other than the efficiency and optimization provided by these low level hardware instructions there is room to improve performance by paravirtualizing the Guest OS in areas where Intel-VT and AMD-V would have little effect. This second thought is backed by the fact that given points #1 and #2 above there would be no more supportability issues as Suse, RedHat and Microsoft are going to provide their own fully supported paravirtualized versions of their own OS kernels.

In my personal opinion this mix of hardware assists virtualization along with OS paravirtualization (or enlightenments) is what we will see most likely in the future. Which brings in the problem of paravirtualization/enlightenments standards. If the actual need of paravirtualization is still up in the air (i.e. will hardware assists support be enough to provide near native performance?) what is going to happen with the standards is even more speculative. However we can try to speculate.

As far as Linux is concerned VMware has submitted to the open source community a tentative paravirtualization standard called VMI. Apparently the Linux open source community has accepted the idea but decided to adopt a slightly different standard to be included into the main kernel called paravirt-ops. The difference between paravirt-ops and VMI is not in the idea of providing a common/standard paravirtualized interface but it is mostly around the implementation details. The idea behind this newly accepted standard is that a single standard Linux kernel could run within a Xen vm, a VMware vm or on a physical server using different (and optimized!) paths in the kernel code depending on the "context" it is running in. It is important to stress that the idea is that there will not be a "hardware kernel" and a "standard virtual kernel" but a single kernel that could run independently on a specific physical hardware as well as any virtualization stack that adhere to the standards.

On Windows the matter is quite different for obvious reasons. MS has already announced that they will paravirtualize (i.e. enlight) their Windows kernel so that it will run optimized on the Viridian hypervisor. It is still to be seen whether this "enlightenment interface" will be compatible with the paravirtualization standards being discussed in the Linux community (along with VMware).

One of the possible options is that Microsoft will work through technology partnerships with Novell and XenSource (both use the opensource Xen hypervisor) to optimize the linux kernel to run on Viridian as you could depict from the MS chart at the very beginning of this post. Other operating systems might only be able to work through legacy and not optimized emulations (this would include older MS operating systems that won't or couldn't be enlightened). Whether this enlightenment API work will converge with the paravirt-ops it is still to be seen.

Even more interesting and up in the air is whether MS will try to avoid Hypervisor ISV's/communities to be able to implement these interfaces in their own products (i.e. they want to avoid, for example, VMware to be able to implement Viridian-like enlightenments support so that they won't be able to run a Viridian-optimized Windows enlightened kernel).

Given the technology partnership it might be easier for Novell and XenSource to implement these interfaces in Xen but one would expect MS to be very concerned about letting other hypervisors run an enlightened Windows kernel as fast as they would run it on Viridian. Only time will tell I guess.

Should this happen, this is clearly not in the interest of the customer since the best thing would be to define a single standard (or a set of multiple standards if they really need to) so that everybody would have a chance to innovate and improve without any impediment restricted by proprietary interfaces.

But fair-play is not, apparently, a characteristic of this business lately. However, as I said at the beginning of this third section, this is still pretty much up in the air and these have been speculations on possible future situations that might be proven wrong.