In any organisation, every server in the network will have a specific purpose in terms of it’s usage, and most of the times these servers are used to provide stable environment to run software applications that are required for organisation’s business. Usually, these applications are very critical for the business, and organisations cannot afford to let them down even for minutes. For Example: A bank having an application which takes care of it’s internet banking.
From the below figure you can see an application running on a standalone server which is configured with Unix Operating System and Database( oracle / sybase / db2 /mssql … etc). And the organisation considered to run it as standalone application just because it was not critical in terms of business, and in other words the whenever the application down it wont impact the actual business.
Usually, the application clients for these application will connect to the application server using the server name , server IP or specific application IP.
Let us assume, if the organisation is having an application which is very critical for it’s business and any impact to the application will cause huge loss to the organisation. In that case, organisation is having one option to reduce the impact of the application failure due to the Operating system or Hardware failure, i.e Purchasing a secondary server with same hardware configuration , install same kind of OS & Database, and configure it with the same application in passive mode. And “failover” the application from primary server to these secondary server whenever there is an issue with underlying hardware/operating system of primary server.
What is failover?
Whenever there is an issue related to the primary server which make application unavailable to the client machines, the application should be moved to another available server in the network either by manual or automatic intervention. Transferring application from primary server to the secondary server and making secondary server active for the application is called “failover” operation. And the reverse Operation (i.e. restoring application on primary server ) is called “Failback“
Now we can call this configuration as application HA ( Highly Available ) setup compared to the earlier Standalone setup. you agree with me ?
Now the question is, how is this manual fail over works when there is an application issue due to Hardware/Operating System?
Manual Faiover basically involves below steps:
- Application IP should failover secondary node
- Same Storage and Data should be available on the secondary node
- Finally application should failover to the secondary node.
Challenges in Manual Failover Configuration
- Continuously monitor resources.
- Time Consuming
- Technically complex when it involves more dependent components for the application.
Then, what is alternative?
Just go for an automated failover software which will group the both primary server and secondary server related to the application, and always keep an eye on primary server for any failures and failover the application to secondary server automatically when ever there is an issue with primary server.
Although we are having two different servers supporting the application, both of them are actually serving the same purpose. And from the application client perspective they both should be treated as single application cluster server ( composed of multiple physical servers in the background).
Wow…. Cluster .
Now, you know that cluster is nothing but “group of individual servers working together to server the same purpose ,and appear as a single machine to the external world”.
What are the Cluster Software available in the market, today? There are many, depending on the Operating System and Application to be supported. Some of them native to the Operating System , and others from the third party vendor
List of Cluster Software available in the market
- SUN Cluster Services – Native Solaris Cluster
- Linux Cluster Server – Native Linux cluster
- Oracle RAC – Application level cluster for Oracle database that works on different Operating Systems
- Veritas Cluster Services – Third Party Cluster Software works on Different Operating Systems like Solaris / Linux/ AIX / HP UX.
- HACMP – IBM AIX based Cluster Technology
- HP UX native Cluster Technology
And In this post, we are actually discussing about VCS and its Operations. This post is not going to cover the actual implementation part or any command syntax of VCS, but will cover the concept how VCS makes application Highly Available(HA).
Note: So far, I managed to explain the concept without using much complex terminology, but now it’s time to introduce some new VCS terminology to you, which we use in every day operations of VCS. Just keep little more focus on each new term.
VCS Components
VCS is having two types of Components 1. Physical Components 2. Logical Components
Physical Components:
1. Nodes
VCS nodes host the service groups (managed applications). Each system is connected to networking hardware, and usually also to storage hardware. The systems contain components to provide resilient management of the applications, and start and stop agents.
Nodes can be individual systems, or they can be created with domains or partitions on enterprise-class systems. Individual cluster nodes each run their own operating system and possess their own boot device. Each node must run the same operating system within a single VCS cluster.
Clusters can have from 1 to 32 nodes. Applications can be configured to run on specific nodes within the cluster.
2. Shared storage
Storage is a key resource of most applications services, and therefore most service groups. A managed application can only be started on a system that has access to its associated data files. Therefore, a service group can only run on all systems in the cluster if the storage is shared across all systems. In many configurations, a storage area network (SAN) provides this requirement.
You can use I/O fencing technology for data protection. I/O fencing blocks access to shared storage from any system that is not a current and verified member of the cluster.
3. Networking Components
Networking in the cluster is used for the following purposes:
- Communications between the cluster nodes and the Application Clients and external systems.
- Communications between the cluster nodes, called Heartbeat network.
Logical Components
1. Resources
Resources are hardware or software entities that make up the application. Resources include disk groups and file systems, network interface cards (NIC), IP addresses, and applications.
1.1. Resource dependencies
Resource dependencies indicate resources that depend on each other because of application or operating system requirements. Resource dependencies are graphically depicted in a hierarchy, also called a tree, where the resources higher up (parent) depend on the resources lower down (child).
1.2. Resource types
VCS defines a resource type for each resource it manages. For example, the NIC resource type can be configured to manage network interface cards. Similarly, all IP addresses can be configured using the IP resource type.
VCS includes a set of predefined resources types. For each resource type, VCS has a corresponding agent, which provides the logic to control resources.
2. Service groups
A service group is a virtual container that contains all the hardware and software resources that are required to run the managed application. Service groups allow VCS to control all the hardware and software resources of the managed application as a single unit. When a failover occurs, resources do not fail over individually— the entire service group fails over. If there is more than one service group on a system, a group may fail over without affecting the others.
A single node may host any number of service groups, each providing a discrete service to networked clients. If the server crashes, all service groups on that node must be failed over elsewhere.
Service groups can be dependent on each other. For example a finance application may be dependent on a database application. Because the managed application consists of all components that are required to provide the service, service group dependencies create more complex managed applications. When you use service group dependencies, the managed application is the entire dependency tree.
2.1. Types of service groups
VCS service groups fall in three main categories: failover, parallel, and hybrid.
- Failover service groups
A failover service group runs on one system in the cluster at a time. Failover groups are used for most applications that do not support multiple systems to simultaneously access the application’s data.
- Parallel service groups
A parallel service group runs simultaneously on more than one system in the cluster. A parallel service group is more complex than a failover group. Parallel service groups are appropriate for applications that manage multiple application instances running simultaneously without data corruption.
- Hybrid service groups
A hybrid service group is for replicated data clusters and is a combination of the failover and parallel service groups. It behaves as a failover group within a system zone and a parallel group across system zones.
3. VCS Agents
Agents are multi-threaded processes that provide the logic to manage resources. VCS has one agent per resource type. The agent monitors all resources of that type; for example, a single IP agent manages all IP resources.
When the agent is started, it obtains the necessary configuration information from VCS. It then periodically monitors the resources, and updates VCS with the resource status.
4. Cluster Communications and VCS Daemons
Cluster communications ensure that VCS is continuously aware of the status of each system’s service groups and resources. They also enable VCS to recognize which systems are active members of the cluster, which have joined or left the cluster, and which have failed.
4.1. High availability daemon (HAD)
The VCS high availability daemon (HAD) runs on each system. Also known as the VCS engine, HAD is responsible for:
- building the running cluster configuration from the configuration files
- distributing the information when new nodes join the cluster
- responding to operator input
- taking corrective action when something fails.
The engine uses agents to monitor and manage resources. It collects information about resource states from the agents on the local system and forwards it to all cluster members. The local engine also receives information from the other cluster members to update its view of the cluster.
The hashadow process monitors HAD and restarts it when required.
4.2. HostMonitor daemon
VCS also starts HostMonitor daemon when the VCS engine comes up. The VCS engine creates a VCS resource VCShm of type HostMonitor and a VCShmg service group. The VCS engine does not add these objects to the main.cf file. Do not modify or delete these components of VCS. VCS uses the HostMonitor daemon to monitor the resource utilization of CPU and Swap. VCS reports to the engine log if the resources cross the threshold limits that are defined for the resources.
4.3. Group Membership Services/Atomic Broadcast (GAB)
The Group Membership Services/Atomic Broadcast protocol (GAB) is responsible for cluster membership and cluster communications.
- Cluster Membership
GAB maintains cluster membership by receiving input on the status of the heartbeat from each node by LLT. When a system no longer receives heartbeats from a peer, it marks the peer as DOWN and excludes the peer from the cluster. In VCS, memberships are sets of systems participating in the cluster.
- Cluster Communications
GAB’s second function is reliable cluster communications. GAB provides guaranteed delivery of point-to-point and broadcast messages to all nodes. The VCS engine uses a private IOCTL (provided by GAB) to tell GAB that it is alive.
4.4. Low Latency Transport (LLT)
VCS uses private network communications between cluster nodes for cluster maintenance. Symantec recommends two independent networks between all cluster nodes. These networks provide the required redundancy in the communication path and enable VCS to discriminate between a network failure and a system failure. LLT has two major functions.
- Traffic Distribution
LLT distributes (load balances) internode communication across all available private network links. This distribution means that all cluster communications are evenly distributed across all private network links (maximum eight) for performance and fault resilience. If a link fails, traffic is redirected to the remaining links.
- Heartbeat
LLT is responsible for sending and receiving heartbeat traffic over network links. The Group Membership Services function of GAB uses this heartbeat to determine cluster membership.
4.5. I/O fencing module
The I/O fencing module implements a quorum-type functionality to ensure that only one cluster survives a split of the private network. I/O fencing also provides the ability to perform SCSI-3 persistent reservations on failover. The shared disk groups offer complete protection against data corruption by nodes that are assumed to be excluded from cluster membership.
5. VCS Configuration files.
5.1. main.cf
/etc/VRTSvcs/conf/config/main.cf is key file interms VCS configuration. the “main.cf” file basically explains below information to the VCS agents/VCS daemons.
- What are the Nodes available in the Cluster?
- What are the Service Groups Configured for each node?
- What are the resources available in each Service Group, the types of resources and it’s attributes?
- What are the dependencies each resource having on other resources?
- What are the dependencies each service group having on other Service Groups?
5.2. types.cf
The file types.cf, which is listed in the include statement in the main.cf file, defines the VCS bundled types for VCS resources. The file types.cf is also located in the folder /etc/VRTSvcs/conf/config.
5.3. Other Important files
- /etc/llthosts—lists all the nodes in the cluster
- /etc/llttab—describes the local system’s private network links to the other nodes in the cluster
Sample VCS Setup
From the below figure you can understand the VCS sample setup configured for an application which is running with Database and Shared Storage.
Why we need Shared Storage for Clusters?
Normally, database servers were configured to store their database on SAN storage and it is mandatory to these database to be reachable to the all other nodes, in the cluster, in order to fail over the database from one node another node. And That is the reason both the nodes in the below figure configured with common shared SAN storage, and in this model all the cluster nodes can see the storage devices from their local operating systems but at a time only one node ( active ) can make write operations to the storage.
Why each server need two Storage Paths ( connected to two HBAs)?
To provide redundancy to the server’s storage connection and to avoid single point of failure in storage connection. When ever you notice multiple storage paths connected to any server, you can safely assume that there is some storage multipath software running on the Operating system e.g. multipathd, emc powerpath, hdlm, mpio …etc.
Why each server need two network connection to physical network?
This is again , to provide redundancy for network connection of the server and to avoid single point of failure in server physical network connectivity. When ever you see dual physical network connection, you can assume that Server is using some king of IP multipath software to mange dual path . e.g. IPMP in solaris, NIC Bonding in linux …. etc.
Why we need minimum two Heart beat Connections, between the cluster nodes?
When the VCS lost all it’s heartbeat connections except the last one, the condition is called cluster jeopardy. When the Cluster in jeopardy state any of the below things could happen
1) The loss of the last available interconnect link
In this case, the cluster cannot reliably identify and discriminate if the last interconnect link is lost or the system itself is lost and hence the cluster will form a network partition causing two or more mini clusters to be formed depending on the actual network partition. At this time, every Service Group that is not online on its own mini cluster, but may be online on the other mini cluster will be marked to be in an “autodisabled” state for that mini cluster until such time that the interconnect links start communicating normally.
2) The loss of an existing system which is currently in jeopardy state due to a problem
In this case, the situation is exactly the same as explained in step 1 forming two or more mini clusters.
In case where both both the LLT interconnect links disconnect at the same time and we do not have any low-pri links configured, then the cluster cannot reliably identify if it is the interconnects that have disconnected and will assume that the other system is down and now unavailable. Hence in this scenario, the cluster would consider this like a system fault and the service groups will be attempted to be onlined on each mini cluster depending upon the system StartupList defined on each Service Group. This may lead to a possible data corruption due to Applications writing to the same underlying data on storage from different systems at the same time. This Scenario is well known as “Split Brain Condition” .
This is all about introduction on VCS, and please stay tuned for the next posts , where I am going to discuss about actual administration of VCS.
Please don’t forget to drop your comments and inputs in the comment section.
Have Happy System Administration!!!!!!
we have noticed that you are republishing many posts without permission from original sites. And we see that atleast 5 popular posts from www.gurkulindia.com. As a owner of www.gurkulindia.com, we are expecting you to action immediately to remove the references from your site.
ReplyDeleteInteresting one. Thanks a lot for the share.
ReplyDeleteIPhone App Development| Android apps developer|