Monday, August 22, 2011

Thin Clients Vs Zero Clients

At Citrix Synergy in San Francisco last week, Wyse unveiled a new product called Wyse Xenith, which is based on the newly announced Wyse Zero platform. Similar to a thin client, the Wyse Xenith is being marketed as a "zero client." Let's look at what exactly a so-called zero client is, how it works and why you'd choose one over a more traditional thin client.

The term thin client means different things to different people. We took a deeper look at the various types of thin clients earlier this year, but the one thing they have in common is that they offload most (if not all) of the heavy work to back-end servers, resulting in a client device that is small, light and, most importantly, stateless.

It's that "stateless" attribute that's the center of attention today. Even though thin clients don't do anything without a network or some servers, some software is often still installed and maintained on the thin client itself. Usually taking the form of firmware or installed into flash memory, thin-client devices typically run an operating system such as Windows CE, Windows XP Embedded or Linux.

So what's the big deal? Even though these thin clients run an OS, they're stateless, right? Doesn't that mean there's no management? Unfortunately, that's not always the case.

First of all, thin clients running any OS might need to be patched or updated from time to time. Even though this patching isn't as frequent as the "Patch Tuesdays" we're all accustomed to in the Windows world, there's often a need to update a thin client's software to deliver new features or capabilities. Second, thin clients often need to be managed to deploy new configuration options or settings.

So even though thin clients don't have any real data on them and do most of the work via central servers, they can still be a pain to manage. To solve this problem, a few vendors starting selling what they called "zero clients" -- client devices with literally no configuration and nothing stored on them. From a functional standpoint, zero clients and thin clients are pretty similar, it's just that zero clients have zero device-based management.

The technology has been around for years, but it wasn't until VMware partnered with Teradici to bring the PC-over-IP (PCoIP) remoting protocol to the VMware View product that zero clients really caught on. (Thin-client makers can create zero client devices for VMware View, and VMware had much success selling these against Citrix.)

Wyse's Xenith product is a zero client that's specifically built for Citrix XenDesktop environments and fully supports Citrix's ICA/HDX protocol stack. The Xenith has no OS and no firmware; instead, it connects to a XenDesktop configuration server to download its configuration and the latest HDX engine as soon as it's powered on. (This process only takes a few seconds.) This means that the entire client environment is managed and configured on the server, and there's nothing to update on the client, since the client updates itself automatically whenever it's turned on.

Even though zero clients have been around for a while, you can bet that now that Citrix has joined the zero-client fray -- Wyse already joined since it also makes VMware View-based PCoIP zero clients -- this space will heat up over the next few years.

Main Source - http://searchvirtualdesktop.techtarget.com/news/1512699/What-are-zero-clients-and-how-are-they-different-from-thin-clients

Thursday, August 11, 2011

Troubleshooting Virtual Machine snapshot problems

A nice guide on Troubleshooting VM snapshot problems --> http://is.gd/ckwDc

This troubleshooting guide explains basic concepts about Virtual Machine snapshots and different troubleshooting paths depending on the problem. This guide was designed for ESX3.5 and extra considerations have to be taken if working with ESX3.5i or ESX4(i). The formulas and most of the procedures described in this document were created by Ruben as part of a continuous troubleshooting improvement process.

Ruben is also creator of SnapVMX utility.

While troubleshooting Virtual Machine (VM) snapshot problems sometimes it is important to retrieve a lot of information in order to take the most appropriate decision in accordance with the situation. That collection and arrangement of information may take a long time especially if the VM has many snapshots.


SnapVMX was created to speed up the troubleshooting process bringing you instantaneously all the information that you need to evaluate the situation and take the correct decision, reducing the downtime to the bare minimum needed to solve the problem,

Source : Eric Blog

Also Try VMware KB TV - Consolidating snapshots (VMware KB 1007849) - video

Tuesday, August 9, 2011

Virtual Distributed Switch and vCenter Server failure

What happens to ESX hosts Network Traffic(Incase of dVswitch Fails / Vcenter is down)

Dont miss out the last paragraph of this post

I’m currently working with my colleagues on an upgrade of our VI 3.5 infrastructure to vSphere Enterprise Plus. We have recently been mulling over some of the design elements we will have to consider and one of the ones that came up was virtual Distributed Switches (vDS). We like the look of it, it saves us having to configure multiple hosts with standard vSwitches and it also has some nice benefits such as enhanced network vMotion support, inbound and outbound traffic shaping and Private VLANs.
vDSOne of the questions that struck me was, what happens if your vCenter server fails? what happens to your networking configuration? Surely your vCenter server couldn’t be a single point of failure for your virtual networking, could it?
Well I did a bit of digging about, chatted to a few people on twitter and the answer is no it would not result in a loss of virtual networking. In vSphere vDS the switch is split into two distinct elements, the control plane and the data plane. Previously both elements were host based and configured as such through connection to the host, either directly using the VI client or through vCenter. In vSphere because the control plane and data plane have been separated, the control plane is now managed using vCenter only and the data plane remains host based. Hence when your vCenter server fails the data plane is still active as it’s host based where as the control plane is unavailable as it’s vCenter based.
Mike Laverick over at RTFM informed me that the central config for a vDS is stored on shared VMFS within a folder called the .dvsData folder. I’ve since learnt that this location is chosen automatically by vCenter and you can use the net-dvs command to determine that location. It will generally be on shared storage that all ESX hosts participating in the vDS have access to. As a back up to this .dvsData folder a local database copy is located in /etc/vmware/dvsData.db which I imagine only comes into play if your vCenter server goes down or if your ESX host loses connectivity to the shared VMFS with the .dvsData folder. You can read more about this over at RTFM

Troubleshoot: While Adding LUN to ESX Error --> Unable read partition information from this disk error

Troubleshooting the Add Storage Wizard error:
Unable read partition information from this disk error

Extracted from --> VMware Kbase (Link here)

Caution: Ensure that the selected disks or LUNs do not have production information as this is a destructive operation.
Symptoms

*Cannot use the Add Storage Wizard to format a disk with a new VMFS Datastore
*Creating a VMFS volume in VMware Infrastructure (VI) Client fails
*The Wizard reports that it is unable to read the pre-existing partition table from the disk
*You receive the following error:

Unable to read partition information from this disk

*The available LUN listing shows a blank in the free space column
*You see a message indicating that the ESX host cannot read the partition table
*The hostd logs contain entries similar to:

[2009-01-26 12:56:20.647 'Partitionsvc' 21990320 info] InvokePartedUtil /sbin/partedUtil
[2009-01-26 12:56:20.706 'Partitionsvc' 21990320 warning] Unable to get partition information for /vmfs/devices/disks/vml.0200030000600508b30093fcf0a05b5b8cc739002f4d
5341313531
[2009-01-26 12:56:20.706 'Partitionsvc' 21990320 warning] Status : 255
Output:
Error : Warning: /vmfs/devices/disks/vml.0200030000600508b30093fcf0a05b5b8cc
739002f4d5341313531 contains GPT signatures, indicating that it has a GPT table. However, it does not have a valid fake msdos partition table, as it should. Perhaps it was corrupted - possibly by a program that doesn't understand GPT partition tables. Or perhaps you deleted the GPT table, and are now using an msdos partition table. Is this a GPT partition table?
Error: The primary GPT table is corrupt, but the backup appears ok, so that will be used.
A bug has been detected in GNU parted. Please email a bug report to bug-parted@gnu.orgcontaining the version (1.6.3) and the following message:Assertion (last_usable <= disk->dev->length) at disk_gpt.c:480 in function _parse_header() failed.
Unable to read partition table for device /vmfs/devices/disks/vml.0200030000600508b30093fcf0a05b5b8cc
739002f4d5341313531

*Rescan and ESX boot operations experience long delays (40 - 120 seconds for each LUN with an EFI GPT partition)
*The Add Storage wizard times out while getting the list of available LUNs

Purpose
This article addresses the situation of an ESX host being unable to create a datastore because the volumes contain an existing non-msdos partition table.
Resolution
Cause
There are several different partitioning schemes that can be created. Each has a corresponding identifying disk label. Common labels include bsd, dvh, gpt, loop, mac, msdos, pc98 or sun. Of these, only the msdos label and partitioning scheme is used by ESX. Trying to create a volume using the Add Storage wizard fails unless the volume contains an msdos partition table or if there is no partition table at all. Any other kind of partition is left unchanged.
Note: Similar symptoms have been observed when a LUN which is greater than 2 terabytes is presented to an ESX host. For more information, see ESX does not support 2 terabyte LUN sizes (3371739).
Checking for non-msdos partitions

To check for non-msdos partitions:

1.Log into the ESX host console using an SSH client or the GUI. For more information, see Unable to connect to an ESX host using Secure Shell (SSH) (1003807).
2.Run the command:

fdisk -l

The output is similar to:

Disk /dev/sdb: 536 MB, 536870912 bytes
255 heads, 63 sectors/track, 65 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot Start End Blocks Id System
/dev/sdb1 1 66 524287+ ee EFI GPT
3.Make note of the Id and System values (highlighted in red). Depending on the value of these fields, ESX may not be able to perform operations to this disk. To allow ESX to make any modifications to this volume, the volume must have an msdos partition or no partition at all.

Changing the partition type in the ESX Service Console using the parted utility
The command line utility parted can be used in the Console operating system to change the label and partitioning scheme.
Warning: To support certain internal operations, ESX installations include a subset of standard Linux configuration commands (for example, network and storage configuration commands). Using these commands to perform configuration tasks can result in serious configuration conflicts and render some ESX functions unusable. Always work through vSphere Client when configuring ESX, unless otherwise instructed in vSphere documentation or by VMware Technical Support. The steps outlined here are potentially hazardous for your environment if they are not followed exactly. If you are not comfortable performing these steps, contact VMware Technical Support and work with them to resolve the issue.
To change the label and partitioning scheme:

1.Log into the ESX host console. For more information, see Tech Support Mode for Emergency Support (1003677).
2.Identify the LUN or disks which have pre-existing partition tables. For more information, see Identifying disks when working with VMware ESX (1014953).

Caution: Ensure that the selected disks or LUNs do not have production information as this is a destructive operation.
3.Run the command:

parted

Where is the disk or LUN identifier selected in step 2.

The following is an example output using the /dev/sdb identifier:

GNU Parted 1.8.1
Using /dev/sdb
Welcome to GNU Parted! Type 'help' to view a list of commands.
4.Within the (parted)prompt, run the command:

print

The output appears similar to:

Disk geometry for /dev/sdb: 0.000-512.000 megabytes
Disk label type: gpt
Number Start End Size File system Name Flags
1 17.4kB 134MB 134MB Microsoft reserved partition msftres
5.Review all of the information and ensure that this drive or LUN is not used for production information.
6.Within the (parted) prompt, run the following command to change the Disk label type to msdos:

mklabel msdos

Caution: The above operation deletes the pre-existing partitions.
7.Within the (parted) prompt, run the command:

print

The output appears similar to:

Disk geometry for /dev/sdb: 0.000-512.000 megabytes
Disk label type: msdos
Minor Start End Type Filesystem Flags
8.Within the (parted) prompt, run the following command to exit the parted utility:

quit
9.Retry the storage operation which was failing initially.

Clearing partitioning information in ESXi using the DD utility
Due to differences between ESX classic and ESXi, the parted utility is not available in ESXi. The following steps describe how to clear partitioning information for a LUN under ESXi.
Warning: The steps outlined here are potentially hazardous for your environment if they are not followed exactly. If you are not comfortable performing these steps, contact VMware Technical Support and work with them to resolve the issue.

1. Log into the ESX host console or via SSH. For more information, see Unable to connect to an ESX host using Secure Shell (SSH) (1003807).
2. Identify the LUN or disks which have pre-existing partition tables. For more information, see Identifying disks when working with VMware ESX (1014953).

Caution: Ensure that the selected Disks or LUNs do not have production information as this is a destructive operation.
3. Run the command:

fdisk -u

Where represents the select disk from step 2.

The output appears similar to:

Disk /dev/disks/vmhba2:0:3:0: 429.4 GB, 429491220480 bytes
255 heads, 63 sectors/track, 52216 cylinders, total 838850040 sectors
Units = sectors of 1 * 512 = 512 bytes

Device Boot Start End Blocks Id System
/dev/disks/vmhba2:0:3:1 128 838850039 419424956 ee EFI GPT
4. Make note of the bytes value highlighted in red.
5. Run the command:

dd if=/dev/zero of=/ bs=512 count=34

Where is the LUN or Disk selected in step 2.

Caution: This replaces the first 34 x 512 bytes of the disk with zeros. This is a destructive command.

Note: If you are using VMware ESXi or the you are specifying is located within the /vmfs/ directory, you may need to append the conv=notrunc parameter to the dd command. Not doing so may result in a Function not implemented error.
6. (GPT Partitions only) Calculate the value by using the following equation:

( / 512) - 34 =

For example:

(429491220480 / 512) - 34 = 838850006

Note: value is the value recorded in step 4.
7.(GPT Partitions only) Run the command:

dd if=/dev/zero of= bs=512 count=34 seek=

Where is the value calculated in step 6, and is the identified identified in step 2.

Caution: This replaces the last 34 x 512 bytes of the disk with zeros. This is a destructive command.
8. Retry the storage operation.

Service Console to VMNIC0 from CLI

--> During Installation of ESX did not assign Service Console to Phy NIC
--> or Can't Associate Network Card with Service Console
--> or Due to some problem (or messed up), Service Console is currently not assigned to Phy NiC

Details
When installing ESX Server 3.x, you are not given the option to select which network card should be associated with the service console. By default, it seems as though vmnic0 is selected.

Solution
Note: This article assumes that the correct network settings (IP Address, Subnet Mask, Gateway and DNS) were assigned during the initial installation.

To resolve this issue, use the esxcfg-vswitch command, as follows:
To the physical switch, attach only the network cable used for the service console.

List all of the network adapters from the ESX Server service console and locate the name of the vmnic# that has a link status of up.
# esxcfg-nics –l
If vmnic0 has a link status of up, stop now. Otherwise, remove vmnic0 from vSwitch0.
# esxcfg-vswitch -U vmnic0 vSwitch0

Associate the vmnic# that has a link status of up from step 2.
# esxcfg-vswitch -L vmnic# vSwitch0

Use the following commands to determine where the ESX Server service console portgroup is (if you are doing a repair after changing physical NICs or similar activity, for example):
# esxcfg-nics -l
Determines which NICs are link up esxcfg-vswif -l to ensure Service Console is the port group, and the settings are correct.

# esxcfg-vswitch -l
Shows which vSwitch the service console port group is in.

# esxcfg-vswitch -L
Links the vmnic and service console together.

Prior to changing the IP address of Service Console on ESX hosts

Take the following into consideration prior to changing the IP address of the Service Console:
  • Changing the primary Service Console's IP address may result in network connectivity loss.
  • Connect to ESX shell via a remote console or a KVM console. The PUTTY or SSH connection terminates during IP address change.
  • VMware HA and DRS, if enabled, must be disabled to eliminate failover.
  • Disconnect and remove the host from VirtualCenter.
  • Change the DNS database for the forward entries and the reverse entries if applicable.
  • For manual resolution in /etc/hosts, change the IP for each ESX host in the VMware HA cluster and, if needed, in any other host in your environment.
  • If there is more than one Service Console present, determine the primary one by checking /etc/sysconfig/network for default gateway device and IP address.

Changing the IP address on ESX 3.x and 4.0

To change the IP address on ESX 3.x and 4.0:
  1. Log in as root to the ESX host console using a SSH or KVM connection.
  2. Run the following command to stop the network service:
    service network stop
  3. Run the esxcfg-vswif command to change the IP of the hosts:
    esxcfg-vswif -i NEW_IP_ADDRESS -n MASK_ADDRESS vswif0
  4. Edit the /etc/sysconfig/network file and change the gateway IP if needed.
  5. Run the following command to restart the network:
    service network start
  6. Add the host back to VirtualCenter using a FQDN (preferably) or by its IP.
  7. Reconnect host to VirtualCenter (if applicable).

Changing the IP address on ESX 2.5

To change the IP address on ESX 2.5:
  1. Log in as root to the ESX host console using a SSH or KVM connection.
  2. Run the following command to stop the network service:
    service network stop
  3. Change the IP and net mask in the correct file. Edit the /etc/sysconfig/network-scripts/ifcfg-eth0 file and change the IP and the net mask.
  4. Change the gateway configuration. Edit the /etc/sysconfig/network file and change the gateway IP if needed.
  5. Run the following commands to restart the interface and the network:
    • ifdown eth0
    • ifup eth0
    • service network start

Sunday, August 7, 2011

Using ESXTOP end to end

ESX using esxtop:

Esxtop is a powerfull CLI tool on ESX server to take a look at performance counters of an ESX system, including the resources that the Virtual Machines (VMs) take. In this short walkthrough I will explain in a general matter how to setup performance logging to a separate Windows share from an ESX server.


We can use esxtop to monitor CPU, memory, network, and disk I/O.

First of all I presume you are a little familiar with Windows, Linux, the ESX CLI and esxtop as a command. Do not worry, I will point out most of the command switches we are going to need, but at least you should know what esxtop is used for.

Create the Windows share

First we are going to create a share on a Windows workstation or server. This share will be used for storing the generated log files.
  • Connect to a Windows PC or server with sufficient free space (logging can easily take a few MegaBytes per minute / ESX server).
    • Create a new local user account

      • Provide a username, will we using “esxlog” in this example
      • Provide a password and make sure the user does not have to change the password at the next login.
    • Create a folder an share

      • Create a folder name “esxlog” on the host (example 2003 server)
      • Share the folder by right clicking it, pick “Properties” and the tab “Sharing”

        • Select “Share this folder” and share the folder as “esxlog’
        • Click “Permissions”
        • Remove “Everyone”
        • Add the newly created user “esxlog”, and give Full Control.
    • Close all dialogs

Mount the share on ESX

Secondly we will mount the share we just created on the ESX server, and make sure we can places a file on it.
  • Log into the Virtual Center Client and connect to the ESX host logging will be configured on.
    • Goto the tab “Configuration” and select “Security Profile”
    • Click “Properties” and tick the checkbox in front of “SMB Client”
      vc-firewall
    • Close alle dialogs
  • Log into the esx server using ssh as root (if there is no root account prefix all following commands with sudo).
  • Mounting and testing the share
    • Create a directory on the ESX server in the “mnt”directory
    • mkdir /mnt/esxlogfiles 
    • Mount the Windows share on the ESX host using:
    • mount -t smbfs -o username=esxlog ///esxlog /mnt/esxlogfiles 

    • Provide the password when prompted
    • Create a testfile on the share:
    • touch /mnt/esxlogfiles/test.log 
      Check the Windows host to see if the file is created there.

Configure esxtop

This is where you dertermine what you will be loggin. esxtop Provides various options for logging. You can use esxtop to log the following. The letter behind the item is used to see this data in esxtop. Start esxtop and press this letter to see the default data.
  • CPU data – c
  • Memory data – m
  • Network data – n
  • Disk – d
  • ESX disk devices – u
  • Virtual Machine disk devices – v

You can modify the columns shown in the various screens by pressing the letter f. A detailed explanation of all columns can be viewed in the esxtop helpfile.

Type the followingto show the help.

man esxtop

You can change the logfile interval by pressing the letter s. Enter a number of seconds to change. The minimum is 2.

After you are done press a capital W to save your settings. You can use the default proposed filename, or provide a custom one.

Starting esxtop in batch mode

Before we execute or schedule the command we test it. We start esxtop with some parameters. Use:
  • -b To start in batchmode (obligatory)
  • -d The number of seconds between the logging intervals
  • -n The number of iterations
  • > Pipe the output to a file (on the mounted Windows share)

Combining these options gives us the following esxtop batch mode example:

esxtop –b –d 3 –n 20 >/mnt/esxlogfiles/.csv

Running this should provide a logfile on the Windows share with 1 minute of logging data (20 iterations of 3 a second interval)

Check the logfile and verify it provides the data you want to collect! Esxtop always provides generic data like processor, memory and swapfile logdata.

Also make sure the time on the server is correctly. Preferably a NTP server.

Starting esxtop in the background

When you want to start esxtop in the background, including the option to completely quit your session, prefix the command with nohup and add the “&” to the end of the command line. This looks like:

nohup esxtop –b –d 3 –n 20 >/mnt/esxlogfiles/.csv &

Do not forget to change the number of interations!

Schedule esxtop to start logging

A third option to start the esxtop in batchmode is to use the crontab to schedule it.

Presumed you are still logged in to the CLI, do the following:

  • Open the default crontab for the current user (root!) by typing
  • crontab –e
  • Add a line to the crontab file. We use the esxtop line tested earlier and add the crontab data (time in this example)
  • 00 12 * * * esxtop –b –d 3 –n 20 >/mnt/esxlogfiles/.csv

This example schedules the command every day at 12:00 hours.

If you only want to run this command a limited number of times do not forget to remove the entry. Again substitute the number of iterations for the number you would like the logging to run.

Log file analysis

When logging has finished you can view the log file using Microsoft excel of Permon. You can import the CSV file into these applications.

Perfmon example
  • Start perfmon (start, run, perfmon)
  • Select “Performance Monitor”, en right click it.
  • Click “Properties”
  • Goto the tab “Source”
  • Click logfiles and choose “Add”
  • Select the created logfile and click “Open”
  • Close the dialog by clicking OK.
  • Goto the tab “Data” and click “Add”
  • Select the counter you would like to see, en add them to the view
    perfmon-addcounters

  • Enjoy too much data in a graph. ;-)
    perfmon-showgraph

Original Source: http://www.b3rg.nl/blog/blog-it/performance-logging-on-esx-using-esxtop.html

Troubleshooting Vsphere Memory

memory

Introduction

As memory prices continue to drop and the x64 bit architecture is embraced and adopted more in the industry, we continue to see a rise in memory demands. Only a few years ago, 1-2 GB virtual machines were the norm, 95% of these being 32 bit operating systems. From my personal experience I have seen this trend change to 2-4 GB as a norm, with the more high performing virtual machines consuming anywhere from 4-16 GB of memory. VMware has answered this demand with vSphere now delivering up to 1TB of addressable memory per physical host, and up to 255GB per virtual machine.

With processors now more powerful than ever, the general shift of virtual machine limitations is changing from compute to memory. This is reflected in our industry today as we see an increase in the memory footprint on traditional servers (Intel Nehalem), and vendors such as Cisco introducing extended memory technology which can more than double the standard memory configuration. I recently had the opportunity to sit in on a Cisco Unified Computing System architectural overview class, and was impressed with what I saw. The extended memory technology is quite unique because it not only allows you to scale our on your memory configuration, it uses a special ASIC to virtualize the memory so there is no reduction in bus speed. A financial advantage to having this many DIMM sockets is you can use lower capacity DIMMs (2 GB or 4GB) to achieve the same memory configuration in a standard server where you would have to use 8GB DIMMs.

Memory Technologies in VMware vSphere

There are some major benefits of virtualization when it comes to memory. VMware implements some sophisticated and unique ways of maximizing physical memory workloads within an ESX host. All of these features work out of the box with no advanced configuration necessary. To understand problems that might occur in your environment you need to be familiar with these basic memory concepts.

  • Transparent Page Sharing – The VMkernel will compare physical memory pages to find duplicates, then free up this redundant space and replaces it with a pointer. If multiple operating systems are running on one physical host, why should you load the same files multiple times? Think of this as the data de-duplication process we are seeing in a majority of backup solutions in the industry.
  • Memory Overcommitment – The act of assigning more memory to powered on virtual machines than the physical server has available. This allows for virtual machines that have heavier memory demands to utilize the memory that is not actively being used on under utilized machines.
  • Memory Overhead - Once a virtual machine is powered on the ESX host reserves memory for the the normal operations of VMware infrastructure. This memory can’t be used for swapping or ballooning, and is reserved for the system.
  • Memory Balloon Driver – When VMware tools are installed on a virtual machine they provide device drivers into the host virtualization layer, from within the guest operating system. Part of this package that is installed is the balloon driver or “vmmemctl” which can be observed inside the guest. The balloon driver communicates to the hypervisor to reclaim memory inside the guest when it’s no longer valuable to the operating system. If the Physical ESX server begins to run low on memory it will grow the balloon driver to reclaim memory from the guest. This process reduces the chance that the physical ESX host will begin to swap, which you will cause performance degradation. Here is an illustration if ballooning in ESX:

image

What to look for
  • Check ESX host swapping. If you are overcommitting memory on the physical ESX host you can run into a situation when each virtual machine is in need of the total amount of what is granted. When the host is out of memory it will begin to page out. Keep an eye on your oversubscription rates of physical hosts, or ensure you have enough memory resources across your DRS clusters so it can balance the load more effectively. Swapping will occur when the following formula is met:

Total_active_memory > (Memory_Capacity – Memory_Overhead) + Total_balloonable_memory + Page_sharing_savings

  • Check for Virtual machine swapping. Make sure you virtual machines have enough memory for the application workload that they are supporting. If virtual machine swapping starts to occur this can put a strain on the disk subsystem.
  • Check to ensure VMware tools are installed and updated. VMware tools not only provides drivers from the guest to the hypervisor, but the balloon driver also gets installed with VMware tools. For proper memory management the ESX host relies on the balloon driver to manage memory.
  • Check memory reservation settings. By default VMware ESX dynamically tries to reclaim memory when not needed. There are situations when you might choose to utilize memory reservations. If you set memory reservations in your environment be aware that this memory is permanently assigned to the host and can not be reallocated when it’s not being used. Don’t sell the balloon driver short, many third part application vendors over spec their configurations for personal safety, and ballooning can help counteract some of that wasted “fluff factor”.
Monitoring with Virtual Center

The first place I would start with checking memory configurations is Virtual Center. Virtual Center provides excellent reporting and gives you granular control over which metrics you would like to report against. VMware vSphere now includes a nice graphical summary in the performance tab of the physical host. This gives you a quick dashboard type view of the overall health of the system over a 24 hour period. Here are some memory samples:

Check your over all % usage (lower is better)

image Check your Ballooning (lower is better)

image

Selecting the advance tab gives you a much more granular way of viewing performance data. At first glance this might look like overkill, but with a little bit of fine tuning, you can make it report on some great historical information. Here is a snapshot of memory utilization with many of the variables we just discussed above, great snapshot of what’s going on (looks healthy below):

Check your various metrics, mainly for swapping activity

image The virtual center performance statistics by default display the past hour of statistics, and show a more detailed analysis of what’s currently happening on your host. Select the option “Chart Options” to change values such as time/date range and which counters you would like to display.

Virtual Center Alarms are an excellent tool that can sometimes be overlooked and forgotten about. While this is more of a proactive tool than a reactive or troubleshooting tool, I thought it was worth mentioning. Setup Memory alerts so you will be notified via e-mail if a problem starts to manifest itself. Here is an alarm configured to trigger if physical host Memory usage is above 90% for 5 minutes or greater. A lot of these alerts are built into Virtual Center so you don’t have to do a lot of pre-configuration work. You do need to make sure you setup the e-mail notifications under the “Actions Tab”.

image

Monitoring with ESXTOP

Esxtop is another excellent way to monitor performance metrics on an ESX host. Similar to the Unix/Linux “Top” command, this is designed to give an administrator a snapshot of how the system is performing. SSH to one of your ESX servers and execute the command “esxtop”. The default screen that you should see is the CPU screen, if you need to monitor memory select the “m” key. Esxtop gives you great real-time information and can even be set to log data over a longer time period, try “esxtop –a –b > performance.csv”. Check your total Physical memory here, make sure you aren’t over committing and causing swapping. Examine what your virtual machines are doing, if you want to just display the virtual machine worlds hit the “V” key.

image

Monitor inside the Virtual Machine

A great feature VMware introduced for Windows virtual machines was integrating VMware performance counters right into the Performance Monitor or “perfmon” tool. If your running vSphere 4 update 1 make sure you read this post first as there is a bug with the vmtools that will prevent them from showing up. You can monitor the same metrics found in Virtual Center and esxtop here. Just another way of getting at the data especially if you have a background in Microsoft Windows and are familiar with perfmon.

image

Monitoring with PowerCLI

Another great place to go to for finding potential memory problems and bottlenecks is PowerCLI. I have been using PowerGUI from Quest, accompanied by a powerpack from Alan Renouf. If your not a command line guru don’t let this discourage you. PowerGUI is a windows application that allows you to run pre-defined PowerCLI commands against your Virtual Center server or your physical ESX hosts. Want to find out what your ESX host service console memory is set to? How about virtual machines that have memory reservations, shares or limits configured? You can pull all of this information using Alan’s powerpack.

image

Conclusion

If your using VMware vSphere, there are many different ways to monitor for memory problems. The Virtual Center database is the first place you should start. Check your physical host memory conditions, then work your way down the stack to the virtual machine(s) that might be indicating a problem. Take a look at esxtop, check some of the key metrics that we discussed above.

Look for the outliers in your environment. If something doesn’t look right, that’s probably the case. Scratch away at the surface and see if something pops up. Use all possible tools available to you like PowerCLI. Approaching problems from a different perspective will sometimes bring light to a situation you weren’t aware of. If all else fails, engage VMware support and open a service request. Support contracts exist for a reason and I have opened many SR’s that were new technical problems that have never been discovered by VMware support.

Original Source: http://www.virtualinsanity.com/index.php/2010/02/19/performance-troubleshooting-vmware-vsphere-memory/