Sunday, May 29, 2011

New Generation Integrity Servers

The new Integrity Servers have been finally unveiled.

This new whole line of Integrity machines are based on Tukwila, the latest iteration of the Itanium processor line which was presented by Intel early this year, and with one exception all of them are based in the blade form factor. Let’s take a quick look of the new servers.

  • Entry-level

In this area, and as the only rack server of the new line, we have the rx28000, at first look it seems no more than a remake of the rx2660 but if you go deeper will find a powerful machine with 2 Quad-core or Dual-core Itanium 9300 processors and a maximum of 192GB of RAM.

That’s a considerable amount of power for a server of this kind. I personally like this server and have to convince my manager to kindly donate one for my home lab ;-)

  • Mid-range

In the mid-range line there are three beautiful babies named BL860c_i2, BL870c_i2 and BL890c_i2.

The key for this new servers is modularity, the BL860c_i2 is the base of her bigger sisters. HP has developed a new piece of hardware known as Integrity Blade Link Assembly which makes possible to combine blade modules. The 870 is composed by two blade modules and the 890 by four. The 860 is no more than a single blade module with a single Link Assembly on its front. This way of combining the blades makes the 890 the only 8 socket blade currently available.

The 870 and the 890 with 16 and 32 cores respectively are the logical replacement for the rx7640 and rx8640 but as many people have been saying since they were publicly presented there is of the OLAR question or really the apparently lack of OLAR which in fact was one of the key features of the mid-range cell-based Integrity servers. We’ll see how this issue is solved.

  • High-End

The new rx2800 and the new blades are great but the real shock for everybody came when HP announced the new Superdome 2. Ladies and gentlemen the new mission critical computing era is here, forget those fat and proprietary racks, forget everything you know about high-end servers and be welcome to the blade land.

This new version of the HP flagship is based on the blade concept. Instead of cells we have cell-blades inside a new 18U enclosure based in the HP C7000 Blade Enclosure. Just remember one word… commonality. The new Superdome 2 will share a lot of parts with the C7000 and can be also managed through the same tools like the Onboard Administrator.

The specs of this baby are astonishing and during the presentation at the HP Technology At Work event four different configurations were outlined ranging from 8 sockets/32 cores in four blade-cells to a maximum of 64 sockets/256 cores in 32 cell-blades distributed through four enclosures in two racks. Like I said, astonishing :-D

There have been a lot rumors during last year about HP-UX and Itanium future mainly because the delays of the Tukwilla processor. The discussion has recently reach ITRC.

But if any of you had doubts about HP-UX future I firmly believe that HP sent a clear message on the opposite direction. HP-UX is probably the more robust and reliable Unix in the enterprise arena. And to be serious, what are you going to use to replace it? Linux? Solaris? Please ;-)

HP resources for VMware

The reason for this post is trying to be a single point of reference for HP related VMware resources.

I created the list for my personal use while ago but in the hope that it can be useful for someone else I decided to review and share it. I will try to keep the list up to date and also add it as a permanent page in the menu above.

General resources

VMware on ProLiant

HP StorageWorks

VDI

vCloud Director

HP ProLiant servers management with hpasmcli

Hpasmcli, HP Management Command Line Interface, is a scriptable command line tool to manage and monitor the HP ProLiant servers through the hpasmd and hpasmxld daemons. It is part of the hp-health package that comes with the HP Proliant Support Pack, or PSP.

[root@rhel4 ~]# rpm -qa | grep hp-health
hp-health-8.1.1-14.rhel4
[root@rhel4 ~]#
[root@rhel4 ~]# rpm -qi hp-health-8.1.1-14.rhel4
Name : hp-health Relocations: (not relocatable)
Version : 8.1.1 Vendor: Hewlett-Packard Company
Release : 14.rhel4 Build Date: Fri 04 Jul 2008 07:04:51 PM CEST
Install Date: Thu 02 Apr 2009 05:10:48 PM CEST Build Host: rhel4ebuild.M73C253-lab.net
Group : System Environment Source RPM: hp-health-8.1.1-14.rhel4.src.rpm
Size : 1147219 License: 2008 Hewlett-Packard Development Company, L.P.
Signature : (none)
Packager : Hewlett-Packard Company
URL : http://www.hp.com/go/proliantlinux
Summary : hp System Health Application and Command line Utility Package
Description :
This package contains the System Health Monitor for all hp Proliant systems
with ASM, ILO, & ILO2 embedded management asics. Also contained are the
command line utilities.
[root@rhel4 ~]#
[root@rhel4 ~]# rpm -ql hp-health-8.1.1-14.rhel4
/etc/init.d/hp-health
/opt/hp/hp-health
/opt/hp/hp-health/bin
/opt/hp/hp-health/bin/IrqRouteTbl
/opt/hp/hp-health/bin/hpasmd
/opt/hp/hp-health/bin/hpasmlited
/opt/hp/hp-health/bin/hpasmpld
/opt/hp/hp-health/bin/hpasmxld
/opt/hp/hp-health/hprpm.xpm
/opt/hp/hp-health/sh
/opt/hp/hp-health/sh/hpasmxld_reset.sh
/sbin/hpasmcli
/sbin/hpbootcfg
/sbin/hplog
/sbin/hpuid
/usr/lib/libhpasmintrfc.so
/usr/lib/libhpasmintrfc.so.2
/usr/lib/libhpasmintrfc.so.2.0
/usr/lib/libhpev.so
/usr/lib/libhpev.so.1
/usr/lib/libhpev.so.1.0
/usr/lib64/libhpasmintrfc64.so
/usr/lib64/libhpasmintrfc64.so.2
/usr/lib64/libhpasmintrfc64.so.2.0
/usr/share/man/man4/hp-health.4.gz
/usr/share/man/man4/hpasmcli.4.gz
/usr/share/man/man7/hp_mgmt_install.7.gz
/usr/share/man/man8/hpbootcfg.8.gz
/usr/share/man/man8/hplog.8.gz
/usr/share/man/man8/hpuid.8.gz
[root@rhel4 ~]#

This handy tool can be used to view and modify several BIOS settings of the server and to monitor the status of the different hardware components like fans, memory modules, temperature, power supplies, etc.

It can be used in two ways:

  • Interactive shell
  • Within a script

The interactive shell supports TAB command completion and command recovery through a history buffer.

[root@rhel4 ~]# hpasmcli
HP management CLI for Linux (v1.0)
Copyright 2004 Hewlett-Packard Development Group, L.P.

--------------------------------------------------------------------------
NOTE: Some hpasmcli commands may not be supported on all Proliant servers.
Type 'help' to get a list of all top level commands.
--------------------------------------------------------------------------
hpasmcli> help
CLEAR DISABLE ENABLE EXIT HELP NOTE QUIT REPAIR SET SHOW
hpasmcli>

As it can be seen in the above example several main tasks can be done, to get the usage of every command simply use HELP followed by the command.

hpasmcli> help show
USAGE: SHOW [ ASR | BOOT | DIMM | F1 | FANS | HT | IML | IPL | NAME | PORTMAP | POWERSUPPLY | PXE | SERIAL | SERVER | TEMP | UID | WOL ]
hpasmcli>
hpasmcli> HELP SHOW BOOT
USAGE: SHOW BOOT: Shows boot devices.
hpasmcli>

In my experience SHOW is the most used command above the others. Following are examples for some of the tasks.

- Display general information of the server

hpasmcli> SHOW SERVER
System : ProLiant DL380 G5
Serial No. : XXXXXXXXX
ROM version : P56 11/01/2008
iLo present : Yes
Embedded NICs : 2
NIC1 MAC: 00:1c:c4:62:42:a0
NIC2 MAC: 00:1c:c4:62:42:9e

Processor: 0
Name : Intel Xeon
Stepping : 11
Speed : 2666 MHz
Bus : 1333 MHz
Core : 4
Thread : 4
Socket : 1
Level2 Cache : 8192 KBytes
Status : Ok

Processor: 1
Name : Intel Xeon
Stepping : 11
Speed : 2666 MHz
Bus : 1333 MHz
Core : 4
Thread : 4
Socket : 2
Level2 Cache : 8192 KBytes
Status : Ok

Processor total : 2

Memory installed : 16384 MBytes
ECC supported : Yes
hpasmcli>

- Show current temperatures

hpasmcli> SHOW TEMP
Sensor Location Temp Threshold
------ -------- ---- ---------
#1 I/O_ZONE 49C/120F 70C/158F
#2 AMBIENT 23C/73F 39C/102F
#3 CPU#1 30C/86F 127C/260F
#4 CPU#1 30C/86F 127C/260F
#5 POWER_SUPPLY_BAY 52C/125F 77C/170F
#6 CPU#2 30C/86F 127C/260F
#7 CPU#2 30C/86F 127C/260F

hpasmcli>

- Get the status of the server fans

hpasmcli> SHOW FAN
Fan Location Present Speed of max Redundant Partner Hot-pluggable
--- -------- ------- ----- ------ --------- ------- -------------
#1 I/O_ZONE Yes NORMAL 45% Yes 0 Yes
#2 I/O_ZONE Yes NORMAL 45% Yes 0 Yes
#3 PROCESSOR_ZONE Yes NORMAL 41% Yes 0 Yes
#4 PROCESSOR_ZONE Yes NORMAL 36% Yes 0 Yes
#5 PROCESSOR_ZONE Yes NORMAL 36% Yes 0 Yes
#6 PROCESSOR_ZONE Yes NORMAL 36% Yes 0 Yes

hpasmcli>

- Show device boot order configuration

hpasmcli> SHOW BOOT
First boot device is: CDROM.
One time boot device is: Not set.
hpasmcli>

- Set USB key as first boot device

hpasmcli> SET BOOT FIRST USBKEY

- Show memory modules status

hpasmcli> SHOW DIMM
DIMM Configuration
------------------
Cartridge #: 0
Module #: 1
Present: Yes
Form Factor: fh
Memory Type: 14h
Size: 4096 MB
Speed: 667 MHz
Status: Ok

Cartridge #: 0
Module #: 2
Present: Yes
Form Factor: fh
Memory Type: 14h
Size: 4096 MB
Speed: 667 MHz
Status: Ok

Cartridge #: 0
Module #: 3
Present: Yes
Form Factor: fh
Memory Type: 14h
Size: 4096 MB
Speed: 667 MHz
Status: Ok
...

In the scripting mode hpasmcli can be used directly from the shell prompt with the -s option and the command between quotation marks, this of course allow you to process the output of the commands like in the below exampl.

[root@rhel4 ~]# hpasmcli -s "show dimm" | egrep "Module|Status"
Module #: 1
Status: Ok
Module #: 2
Status: Ok
Module #: 3
Status: Ok
Module #: 4
Status: Ok
Module #: 5
Status: Ok
Module #: 6
Status: Ok
Module #: 7
Status: Ok
Module #: 8
Status: Ok
[root@rhel4 ~]#

To execute more than one command sequentially separate them with a semicolon.

[root@rhel4 ~]# hpasmcli -s "show fan; show temp"

Fan Location Present Speed of max Redundant Partner Hot-pluggable
--- -------- ------- ----- ------ --------- ------- -------------
#1 I/O_ZONE Yes NORMAL 45% Yes 0 Yes
#2 I/O_ZONE Yes NORMAL 45% Yes 0 Yes
#3 PROCESSOR_ZONE Yes NORMAL 41% Yes 0 Yes
#4 PROCESSOR_ZONE Yes NORMAL 36% Yes 0 Yes
#5 PROCESSOR_ZONE Yes NORMAL 36% Yes 0 Yes
#6 PROCESSOR_ZONE Yes NORMAL 36% Yes 0 Yes

Sensor Location Temp Threshold
------ -------- ---- ---------
#1 I/O_ZONE 47C/116F 70C/158F
#2 AMBIENT 21C/69F 39C/102F
#3 CPU#1 30C/86F 127C/260F
#4 CPU#1 30C/86F 127C/260F
#5 POWER_SUPPLY_BAY 50C/122F 77C/170F
#6 CPU#2 30C/86F 127C/260F
#7 CPU#2 30C/86F 127C/260F

[root@rhel4 ~]#

If you want to play more with hpasmcli go to its man page and to the ProLiant Support Pack documentation.

Cisco UCS Platform Emulator, first impressions

Although I work on HP servers, a few of days ago I decided to try the Cisco UCS Platform Emulator. I’ve been using HP blades and Virtual Connect for years so I thought that it would be great to get a small taste of its most direct competitor. And, of course, this is my blog and not an HP blog so I can try and write about anything I want :-)

The UCS Platform Emulator can be freely downloaded from the Cisco Developer Network here, you just need to fill a small form. It is also very recommendable to grab a copy of the emulator guide here.

The emulator is virtual machine with a Linux CentOS inside, you can run it with VMware Workstation as I do or with the free VMware Player. After firing it up, the startup process will go like any RedHat based Linux until you will get to a Starting cisco_ucspe messge. Then It will start to decompress and install the “real” emulator into the VM.

After that process is finished you will get to login screen that ask you to use the user and password config in order to access the configuration menus of the UCSPE. From those menus you can set stuff like the number of UCS chassis to be used and the number of blades per chassis for example.

Once you have configured the emulator point your web browser to the IP address of the emulator to access the UCS Manager. The network the UCSPE virtual machines is connected to must be configured to provide IP addresses with DHCP, I used the default NAT network the UCSPE came with.

When you are in the web browser click the LAUNCH link.

And here it is, in all it magnificence… the Cisco Unified Computing System Manager.

From here I will get familiar with the manager and try things like UCS Service Profile creation or playing with the UCS API. I will try to write about it in a future post.

Just a final word. I am very pleased with emulator; have to admit that Cisco did a great work, the whole “thing” was up and running in a breeze and provides you with a very almost-real experience of the UCS platform.

I firmly believe that this is the correct direction that every vendor should follow, yes including HP. If they release simulators to the IT people out there, in the end it will beneficial for them.

Juanma.http://jreypo.wordpress.com/category/virtualization/page/3/

Saturday, May 28, 2011

VNX/VNXe Updates – Bigger, Faster, Stronger

VNX and VNXe started shipping in volume in March, and demand has been overwhelming – which rocks!

So – 2 months have passed since volume shipping, and 5 months since the announcement, which means it’s time for a TON of new stuff :-)

image

The VNXe has been a hit – customers, partners everyone loves the extreme simplicity, advanced functionality, multi-protocol block/NAS capabilities, VMware and application integration built in, tight packaging, and of course the low cost.

When the VNXe was being developed, we knew that the SMB market was something that EMC didn’t really have a strong track record in (Iomega notwithstanding), so we needed to really nail it. But, we also suspected that there were a ton of OEM use cases that we had never addressed. Since then I’ve been pulled into the coolest discussions about this. VNXe’s in tanks, on boats (even using VPLEX to have stretched vSphere clusters across larger military vehicles so they can withstand substantial fire and damage – wow). And that’s the just beginning of the OEM use cases.

So – perhaps it’s not a surprise to see an OEM application – Themis has partnered with EMC to create a hardened, ruggedized VNXe for a variety of military uses – pretty darn cool.

imageOn the VNX front – the list is long.

We haven’t been talking about it as much as we could, but the amount of bandwidth you can drive through a VNX is very impressive. 10GBps is possible on even the VNX5700. That’s a lot. That’s 80Gbps, or put another way, the bandwidth of 8 10Gbps intefaces, or about the bandwidth of a loaded UCS configuration.

For perspective, you can do about 15GBps on a fully populated VMAX (which of course is usually selected less for throughput and more for transactional IOps performance and scale), and of course, on a loaded Isilon cluster, you can do 50-80GBps.

But unlike the VMAX and Isilon which are built for purpose (“tactical nukes”), the VNX is simple, efficient, powerful utility/swiss army-knife storage – used for all sorts of blended use cases. I think that 10Gbps range is pretty impressive. One of the reasons was the jump to 4-lane 6 SAS on the backend. But for some customers, even that isn’t enough, and there’s still headroom in the storage processors. So, we’re introducing the “High Bandwidth” VNX configuration. In this configuration, the already impressive backend bandwidth is jacked up by a factor of 2x.

image

Then, the 25 spindle 2.5” drive tray has started shipping. These are cool, but if you ask me, have a surprisingly limited life – at least in the way that people are thinking about them.

Most customer look at them and go “hey SAS disks are smaller (physically) and can be less expensive, and use less power – so that’s good”. Well, yeah.

I REALLY think that before you know it, we’ll be in all SSD configurations (SLC and MLC), with magnetic media being used for very large capacity/$/W/sqft scenarioes (for which they will exist well through 2015 – again, IMO).

So with that in mind, it’s better thought of as a “25 SSD IO eating enclosure”

But – what these new enclosures represent are tip of the iceberg for very dense SSD scenarioes. And yes, of course we’re looking at all sorts of stuff, including ones where potentially we leverage form factors that eschew traditional flash packaging (think of how Apple used the mSATA form factor in the MacBook Air).

BUT – sometimes it’s not about IO capacity, but is about how dense the platform can be when it comes to GB/array/floor tile.

So – we also continue to expand capabilities on that front – here you can see the new 60 spindle 4U enclosure. This enclosure was codenamed “voyager” for EMC trivia buffs. This lets you fit 60 3.5” disks in 4U, or put another way, 600 disks in a rack, or (coupled with the new 3TB spindles) 1.8PB raw per floor tile. That’s a lot.

image

image

A quick “heads up” – these require a special “dense rack”. They can be added to existing VNX systems, but not into existing VNX racks. BTW – the majority of these dense configs literally configure the entire array that way. Also – in case anyone is wondering about 2.5” disks – of course, we’re working on the next-gen “dense” configs. For now, when you want dense GB configs – 2.5” disks are a bad choice, you get far more bang for your buck from 3.5” disks.

Also – there’s a pile of software updates that were announced. Unisphere continues it’s rapid march of progress. BTW – we’re finding many customers are selecting VNX after trying Unipshere. Customer feedback is overwhelmingly positive (it’s not perfect, and we’ll continue to improve) .

image

Unisphere version 1.1 (remember from my VNX/VNXe launch posts about how the teams work together, driving innovation in a producer/consumer model to each other?) continues to get better. VNX customers get the VNXe goodness of application-integration right in the UI itself on top of the vCenter API integration that was already there. And yes, if you look closely, there is a VMAX tab in that Unisphere screenshot :-)

Unisphere is also getting key awesome sauce that is in EMC Replication Manager embedded directly in the platform. EMC and NetApp have led the way amongst the storage vendors with the idea of “application-integrated” replication – EMC with Replication Manager and NetApp with the SnapManager family.

While customers dig those tools – one thing that we’ve really been focused on is making the “time to value” and overall solution simplicity better. So – rather than just creating a Unipshere “snap-in” for Replication Manager, this project (again, for trivia buffs, this is code named “project Archway”) will EMBED the majority of those functions directly in the platform itself – making management cake. this part is a bit of a preview of things two come.

image

BTW – in case you missed it, and you that no matter HOW easy we make storage management UIs like Unisphere… for the people who run the applications preference would be to NEVER LEAVE your application UIs.

Hence, the EMC Virtual Storage Integrator (VSI) which plugs into vCenter, as an example.

The week before EMC World, we launched EMC Storage Integrator (ESI). ESI is the equivalent of the Virtual Storage Integrator, but instead of vCenter integration (“home” for VMware admins), it integrates with the Microsoft MMC tools for Sharepoint, SQL Server and other apps. We also announced our SCOM Pro pack.

image

I’d highly recommend that any customer using any Microsoft technology (..ahem.. that means you, because it means everybody) check it out. Like VSI, ESI is completely FREE. You can read more about it at my colleague Adrian Simays’ site here.

Phew… Is there more? YES!

We’ve launched the Cloud Tiering Appliance. This enables VNX customers to take files and based on policy, to automatically extend the idea of FAST VP right out of the platform, and right out to

image

What else? Well – we’ve integrated the VNX filesystem with API-level integration with the Google Search Appliance. This is pretty cool – through this integration, finding your corporate info becomes much faster and easier.

image

The Google search appliance doesn’t need to crawl the VNX filesystems, the VNX literally pushes the info to the appliance. Cool.

2011 has been an EPIC year so far for EMC’s midrange and SMB customers – and… we’re only just getting started :-)

Using iSCSI with VMware vSphere

One of the most popular posts done was the original “A ‘Multivendor Post’ to help our mutual iSCSI customers using VMware” that focused on the operation of the software iSCSI initiator in ESX 3.5 with several iSCSI targets from multiple vendors. There’s been a lot of demand for a follow-up, so without further ado, here’s a multivendor collaborative effort on an update, which leverages extensively content from VMworld 2009 sessions TA2467 and TA3264. The post was authored by the following vendors and people: VMware (Andy Banta), EMC (Chad Sakac), NetApp (Vaughn Stewart), Dell/EqualLogic( Eric Schott), HP/Lefthand Networks (Adam Carter)

One important note – in this post (and going forward we’ll be trying to do this consistently) all commands, configurations and features noted apply to vSphere ESX and ESXi equally. Command line formats are those used in when using the VMware vMA, which can be used with both ESX and ESXi. Alternate command line variants are possible when using the remote CLI or service console, but we’re standardizing on the vMA.

This post covers a broad spectrum of topics surrounding the main point (changes in, and configuration of the iSCSI software initiator in vSphere ESX/ESXi 4) including:

  • Multipathing using NMP and the Pluggable Storage Architecture
  • vmknic/vSwitch setup
  • Subnet configuration
  • Jumbo Frames
  • Delayed Ack
  • Other configuration recommendations

If this sounds interesting – or you’re a customer using (or considering using!) iSCSI and vSphere 4 – read on!

First topic: core changes in the iSCSI software initiator from ESX 3.x.

The ESX software iSCSI initiator was completely rewritten for vSphere 4. This was done primarily for performance reasons, but also because the vSphere 4 compatibility base for Linux drivers transitioned from the 2.4 kernel to the 2.6 kernel. Remember that while the vmkernel doesn’t “run” Linux, or “run on Linux” – the core driver stack has common elements with Linux. Along with the service console running a Linux variant, these are the two common sources of the “VMware runs on Linux” theory – which is decidedly incorrect.

As an aside, there is also an interest in publishing a iSCSI HBA DDK, allowing HBA vendors to write and supply their own drivers, decoupled from ESX releases. The changes could also allow storage vendors to write and supply components to manage sessions to make better use of the pluggable multipathing capability delivered in ESX4. (Neither the HBA DDK nor the session capability have been released, yet. Development, documentation and certification suites are still underway.)

Some of the goodness that was in ESXi 3.5, has also made it into all ESX versions:

  • The requirement for a Console OS port on your iSCSI network has been removed. All iSCSI control path operations are done through the same vmkernel port used for the data path. This compares with ESX 3.x where iSCSI control operations required a console port. This is a very good thing: no console port needed for ESX4 – all versions.
  • Enabling the iSCSI service also automatically configures all the firewall properties needed.

Performance is improved several ways:

  • Storage paths are more efficient and keep copying and potentially blocking operations to a minimum.
  • Systems using Intel Nehalem processors can offload digest calculation the the processors' built-in CRC calculation engine.
  • However, the biggest performance gain is allowing the storage system to to scale to the number of NICs available on the system. The idea is that the storage multipath system can make better use of the multiple paths it has available to it than NIC teaming at the network layer.
  • If each physical NIC on the system looks like a port to a path to storage, the storage path selection policies can make better use of them.

Second topic: iSCSI Multipathing

This is the perhaps the most important change in the vSphere iSCSI stack.

iSCSI Multipathing is sometimes also referred to as "port binding.", However, this term is ambiguous enough (often it makes people think of “link aggregation” incorrectly) that we should come up with a better term…

By default, iSCSI multipathing is not enabled in vSphere4. The ESX iSCSI initiator uses vmkernel networking similarly to ESX 3.5, out of the box. The initiator presents a single endpoint and NIC teaming through the ESX vswtich takes care of choosing the NIC. This allows easy upgrades from 3.5 and simple configuration of basic iSCSI setups.

Setting up iSCSI multipathing requires some extra effort because of the additional layer of virtualization provided by the vSwitch. The ESX vmkernel networking stack, used by the iSCSI initiator, communicates with virtual vmkernel NICs, or vmkNICs. The vmkNICs are attached to a virtual switch, or vswitch, that is then attached to physical NICs.

Once iSCSI multipathing is set up, each port on the ESX system has its own IP address, but they all share the same iSCSI initiator iqn. name.

So – setup in 4 easy steps:

Step 1 – configure multiple vmkNICs

Ok, the first obvious (but we’re not making any assumptions) is that you will need to configure multiple physical Ethernet interfaces, and multiple vmkernel NIC (vmkNIC) ports, as shown in the screenshot below.

image

You do this by navigating to the Properties dialog for a vSwitch and select “add”, or by simply clicking on “add Networking” and add additional vmkNICs.

This can also be done via the command line:

esxcfg-vmknic --server -a -i 10.11.246.51 -n 255.255.255.0

Note: certain vmkNIC parameters (such as jumbo frame configuration) can only be done as the vmkNIC is being initially configured. Changing them subsequently requires removing and re-adding the vmkNIC. For the jumbo frame example, see that section later in this post.

Step 2 – configure explicit vmkNIC-to-vmNIC binding.

To make sure the vmkNICs used by the iSCSI initiator are actual paths to storage, ESX configuration requires the vmkNIC is connected to a portgroup that only has one active uplink and no standby uplinks. This way, if the uplink is unavailable, the storage path is down and the storage multipathing code can choose a different path. Let’s be REALLY clear about this – you shouldn’t use link aggregation techniques with iSCSI – you should/will use MPIO (which defines end-to-end paths from initiator to target). This isn’t stating that these aren’t bad (they are often needed in the NFS datastore use case) – but remember that block storage models use MPIO in the storage stack, not the networking stack for multipathing behavior.

Setting up the vmkNICs to use only a single uplink can be done through the UI, as shown below – just select the adapter in the the “active” list and move it down to “unused adapters”, such that each vmkNIC used for iSCSI has only one active physical adapter.

image

Instructions for doing this are found in Chapter 3 of the iSCSI SAN Configuration Guide, currently page 32.

Step 3 – configuring the iSCSI initiator to use the multiple vmkNICs

Then the final step requires command line configuration. This step is where you assign, or bind, the vmkNICs to the ESX iSCSI initiator. Once the vmkNICs are assigned, the iSCSI initiator uses these specific vmkNICs as outbound ports, rather than the vmkernel routing table. Get the list of the vmkNICs used for iSCSI (in the screenshot below, this was done using the vicfg-vmknic --server –l command

image

Then, explicitly tell the iSCSI software initiator to use all the appropriate iSCSI vmkNICs using the following command:

esxcli –-server swiscsi nic add -n -d

To identify the vmhba name, navigate to the “Configuration” tab in the vSphere client, and select “Storage Adapters”. You’ll see a screen like the one below. In the screenshot below, the vmhba_name is “vmhba38”. Note also in the screenshot below, the 2 devices have four paths.

image

The end result of this configuration is that you end up with multiple paths to your storage. How many depends on your particular iSCSI target (storage vendor type). The iSCSI intiator will login to each iSCSI target reported by the “Send targets” command issued to the iSCSI target listed in the “Dynamic Discovery” dialog box from each iSCSI vmkNIC.

Before we proceed we need to introduce the storage concept of a storage portal to those whom may not be familiar with iSCSI. At a high level an iSCSI portal is the IP address(es) and port number of a SCSI storage target. Each storage vendor may implement storage portals in slightly different manners.

In storage nomenclature you will see devices “runtime name” represented in the following format: vmhba#:C#:T#:L#. The C represents a controller, the T is the SCSI target, and the L represents the LUN.

With single-portal storage, such as EqualLogic or LeftHand systems, you'll get as many paths to the storage as you have vmkNICs (Up to the ESX maximum of 8 per LUN/Volume) for iSCSI use. These storage systems only advertise a single storage port, even though connections are redirected to other ports, so ESX establishes one path from each server connection point (the vmkNICs) to the single storage port.

clip_image010

A single-portal variation is an EMC Celerra iSCSI target . In the EMC Celerra case, a large number of iSCSI targets can be configured, but a LUN exists behind a single target – and the Celerra doesn’t redirect in the way EqualLogic or Lefthand do. In the EMC Celerra case, configure an iSCSI target network portal with multiple IP addresses. This is done by simply assigning multiple logical (or physical) interfaces to a single iSCSI target. ESX will establish one path from each server connection (the vmkNICs) to all the IP addresses of the network portal.

Yet other storage advertises multiple ports for the storage, either with a separate target iqn. name or with different target portal group tags (pieces of information returned to the server from the storage during initial discovery). These multi-portal storage systems, such as EMC CLARiiON, NetApp FAS, and IBM N-Series, allow paths to be established between each server NIC and each storage portal. So, if your storage has three vmkNICs assigned for iSCSI and your storage has two portals, you'll end up with six paths.

clip_image012

These variations shouldn’t be viewed as intrinsically better/worse (at least for the purposes of this multivendor post – let’s leave positioning to the respective sales teams). Each array has a different model for how iSCSI works.

There are some limitations for multiple-portal storage that require particular consideration. For example, EMC CLARiiON currently only allows a single login to each portal from each initiator iqn. Since all of the initiator ports have the same iqn., this type of storage rejects the second login. (You can find log messages about this with logins failing reason 0x03 0x02, "Out of Resources."). You can work around this problem by using the subnet configuration described here. Details on the CLARiiON iSCSI target configuration and multipathing state can be seen in the EMC Storage Viewer vCenter plugin.

By default storage arrays from NetApp, including the IBM N-Series, provide an iSCSI portal for every IP address on the controller. This setting can be modified by implementing access lists and / or disabling iSCSI access on physical Ethernet ports. The NetApp Rapid Cloning Utility provides an automated means to configure these settings from within vCenter.

Note that iSCSI Multipathing is not currently supported with Distributed Virtual Switches, either the VMware offering or the Cisco Nexus 1000V. Changes are underway to fix this and allow any virtual switch to be supported.

Step 4 – Enabling Multipathing via the Pluggable Storage Architecture

Block storage multipathing is handled by the MPIO part of the storage stack, and selects paths (for both performance and availability purposes) based on an end-to-end path.

image

This is ABOVE the SCSI portion of the storage stack (which is above iSCSI which in turn is above the networking stack). Visualize the “on-ramp” to a path as the SCSI initiator port. More specifically in the iSCSI case, this is based on the iSCSI session – and after step 3, you will have multiple iSCSI sessions. So, if you have multiple iSCSI sessions to a single target (and by implication all the LUNs behind that target), you have multiple ports, and MPIO can do it’s magic across those ports.

This next step is common across iSCSI, FC, & FCoE.

When it comes to path selection, bandwidth aggregation and link resiliency in vSphere, customers have the option to use one of VMware's Native Multipathing (NMP) Path Selection Policies (PSP), 3rd party PSPs, or 3rd party Multipthing Plug-ins (MPP) such as PowerPath V/E from EMC.

All vendors on this post support all of the NMP PSPs that ship with vSphere, so we’ll put aside the relative pros/cons of 3rd party PSPs and MPPs in this post, and assume use of NMP.

NMP is included in all vSphere releases at no additional cost. NMP is supported in turn by two “pluggable modules”. The Storage Array Type Plugin (SATP) identifies the storage array and assigns the appropriate Path Selection Plugin (PSP) based on the recommendations of the storage partner.

VMware ships with a set of native SATPs, and 3 PSPs: Fixed, Most Recently Used (MRU), & Round Robin (RR). Fixed and MRU options were available in VI3.x and should be familiar to readers. Round Robin was experimental in VI3.5, and is supported for production use in vSphere (all versions)

Configuring NMP to use a specific PSP (such as Round Robin) is simple and easy. You can do it in the vSphere Client under configuration, storage adapter, select the devices, and right click for properties. That shows this dialog box (note that Fixed or MRU are always the default, and with those policies, depending on your array type – you may have many paths as active or standby, only one of them will be shown as “Actve (I/O)”):

fig6_69

You can change the Path Selection Plugin with the pull down in the dialog box. Note that this needs to be done manually for every device, on every vSphere server when using the GUI. It’s important to do this consistently across all the hosts in the cluster. Also notice that when you switch the setting in the pull-down, it takes effect immediately – and doesn’t wait for you to hit the “close” button.

You can also configure the PSP for any device using this command:

esxcli -–server nmp device setpolicy --device --psp

Alternatively, vSphere ESX/ESXi 4 can be configured to automatically choose round robin for any device claimed by a given SATP. To make all new devices that use a given SATP to automatically use round robin, configure ESX/ESXi to use it as the default path selection policy from command line.

esxcli --server corestorage claiming unclaim --type location

esxcli --server nmp satp setdefaultpsp --satp --psp VMW_PSP_RR

esxcli --server corestorage claimrule load

esxcli --server corestorage claimrule run

Three Additional Questions and Answers on the Round Robin PSP…

Question 1: “When shouldn’t I configure Round Robin?”

Answer: While configuring interface you may note that Fixed and MRU are always the default PSP associated with the native SATP options – across all arrays. This is a protective measure in case you have VMs running Microsoft Cluster Services (MSCS). Round Robin can interfere with applications that use SCSI reservations for sharing LUNs among VMs and thus is not supported with the use of LUNs with MSCS. Otherwise, there’s no particular reason not to use NMP Round Robin, with the additional exception of the note below (your iSCSI array requires the use of ALUA, and for one reason or another you cannot change that)

Question 2: “If I’m using an Active/Passive array – do I need to use ALUA”?

Answer: There is another important consideration if you are using an array that has an “Active/Passive” LUN ownership model when using iSCSI. With these arrays, Round-Robin can result in path thrashing (where a storage target bounces behind storage processors in a race condition with vSphere) if the storage target is not properly configured.

Of the vendors on this list, EMC CLARiiON and NetApp traditionally are associated with an “Active/Passive” LUN ownership model – but it’s important to note that the NetApp iSCSI target operates in this regard more like the EMC Celerra iSCSI target and is “Active/Active” (fails over with the whole “brain” when the cluster itself fails over rather than the LUN transiting from one “brain to another” - “brain” in Celerra-land is called a Datamover, in NetApp land is called a cluster controller).

Conversely the CLARiiON iSCSI target LUN operates the same as an CLARiiON FC target LUN – and trespasses from one storage processor to another. So – ALUA configuration is important for CLARiiON for iSCSI, Fibre-Channel/FCoE connected hosts, and NetApp when using Fibre Channel/FCoE connectivity (beyond the scope of this post). So – if you’re not a CLARiiON iSCSI customers, or using CLARiiON or NetApp with Fibre Channel/FCoE) customer (since this multipathing section applies to FC/FCoE), you can skip to the third Round Robin note.

Active/Passive LUN ownership models in VMware lingo doesn’t mean that one storage processor (or “brain” of the array) is idle – rather that a LUN is “owned” (basically “behind”) one of the two storage processors at any given moment. If using EMC CLARiiON CX4 (or NetApp array with Fibre-Channel) and vSphere, the LUNs should be configured for Asymmetric Logical Unit Access (ALUA). When ALUA is configured, as opposed to the ports on the “non-owning storage processor” showing up in the vSphere client as “standby”, they show up as “active”. Now – they will not be used for I/O in a normal state – as the ports on the “non-owning storage processor” are “non-optimized” paths (there is a slower, more convoluted path for I/O via those ports). This is shown in the diagram below.

image

On each platform – configuring ALUA entails something specific.

On an EMC CLARiiON array when coupled with vSphere (which implements ALUA support specifically with SCSI-3 commands, not SCSI-2), you need to be running the latest FLARE 28 version (specifically 04.28.000.5.704 or later). This in turn currently implies CX4 only, not CX3, and not AX. You then need to run the Failover Wizard and configure the hosts in the vSphere cluster to use failover mode 4 (ALUA mode). This is covered in the CLARiiON/VMware Applied Tech guide (the CLARiiON/vSphere bible) here, and is also discussed on this post here.

Question 3: “I’ve configured Round Robin – but the paths aren’t evenly used”

Answer: The Round Robin policy doesn’t issue I/Os in a simple “round robin” between paths in the way many expect. By default the Round Robin PSP sends 1000 commands down each path before moving to the next path; this is called the IO Operation Limit. In some configurations, this default configuration doesn't demonstrate much path aggregation because quite often some of the thousand commands will have completed before the last command is sent. That means the paths aren't full (even though queue at the storage array might be). When using 1Gbit iSCSI, quite often the physical path is often the limiting factor on throughput, and making use of multiple paths at the same time shows better throughput.

You can reduce the number of commands issued down a particular path before moving on to the next path all the way to 1, thus ensuring that each subsequent command is sent down a different path. In a Dell/EqualLogic configuration, Eric has recommended a value of 3.

You can make this change by using this command:

esxcli --server nmp roundrobin setconfig --device --iops --type iops

Note that cutting down the number of iops does present some potential problems. With some storage arrays caching is done per path. By spreading the requests across multiple paths, you are defeating any caching optimization at the storage end and could end up hurting your performance. Luckily, most modern storage systems don't cache per port. There's still a minor path-switch penalty in ESX, so switching this often probably represents a little more CPU overhead on the host.

That’s it!

If you go through these steps, and you will a screen that looks like this one. Notice that Round Robin is the Path Selection configuration, and the multiple paths to the LUN are both noted as “Active (I/O)”. With an ALUA-configured CLARiiON, the paths to the “non-owning” storage processor ports will show as “Active” – meaning they are active, but not being used for I/O

image

This means you’re driving traffic down multiple vmknics (and under the vSphere client performance tab, you will see multiple vmknics chugging away, and if you look at your array performance metrics, you will be driving traffic down multiple target ports).

Now, there are couple other important notes – so let’s keep reading :-)

Third topic: Routing Setup

With iSCSI Multipathing via MPIO, the vmkernel routing table is bypassed in determining which outbound port to use from ESX. As a result of this VMware officially says that routing is not possible in iSCSI SANs using iSCSI Multipathing. Further – routing iSCSI traffic via a gateway is generally a bad idea. This will introduce unnecessary latency – so this is being noted only academically. We all agree on this point – DO NOT ROUTE iSCSI TRAFFIC.

But, for academic thoroughness, you can provide minimal routing support in vSphere because a route look-up is done when selecting the vmknic for sending traffic. If your iSCSI storage network is on a different subnet AND you iSCSI Multipathing vmkNICs are on the same subnet as the gateway to that network, routing to the storage works. For example look at this configuration:

  • on the vSphere ESX/ESXi server:
    • vmk0 10.0.0.3/24 General purpose vmkNIC
    • vmk1 10.1.0.14/24 iSCSI vmkNIC
    • vmk2 10.1.0.15/24 iSCSI vmkNIC
    • Default route: 10.0.0.1
  • on the iSCSI array:
    • iSCSI Storage port 1: 10.2.0.8/24
    • iSCSI Storage port 2: 10.2.0.9/24

In this situation, vmk1 and vmk2 are unable to communicate with the two storage ports because the only route to the storage is accessible through vmk0, which is not set up for iSCSI use. If you add the route:

Destination: 10.2.0.0/24 Gateway: 10.1.0.1 (and have a router at the gateway address)

then vmk1 and vmk2 are able to communicate with the storage without interfering with other vmkernel routing setup.

Fourth topci: vSwitch setup

There are no best practices for whether vmkNICs should be on the same or different vswitches for iSCSI Multipathing. Provided the vmkNIC only has a single active uplink, it doesn't matter if there are other iSCSi vmkNICs on the same switch or not.

Configuration of the rest of your system should help you decide the best vswitch configuration. For example, if the system is a blade with only two NICs that share all iSCSI and general-purpose traffic, it makes best sense for both uplinks to be on the same vswitch (to handle teaming policy for the general, non-iSCSI traffic). Other configurations might be best configured with separate vswitches.

Either configuration works.

Forth topic: Jumbo frames

Jumbo frames are supported for iSCSI in vSphere 4. There was confusion about whether or not they were supported with ESX 3.5 – the answer is no, they are not supported for vmkernel traffic (but are supported for virtual machine traffic).

Jumbo frames simply means that the size of largest the Ethernet frame passed between one host and another on the Ethernet network is larger than than the default. By default, the "Maximum Transmission Unit" (MTU) for Ethernet is 1500 bytes. Jumbo frames are often set to 9000 bytes, the maximum available for a variety of Ethernet equipment.

The idea is that larger frames represent less overhead on the wire and less processing on each end to segment and then reconstruct Ethernet frames into the TCP/IP packets used by iSCSI. Note that recent Ethernet enhancements TSO (TCP Segment Offload) and LRO (Large Receive Offload) lessen the need to save host CPU cycles, but jumbo frames are still often configured to extract any last benefit possible.

Note that jumbo frames must be configured end-to-end be useful. This means the storage, Ethernet switches, routers and host NIC all must be capable of supporting jumbo frames – and Jumbo frames must be correctly configured end-to-end on the network. If you miss a single Ethernet device, you will get a significant number of Ethernet layer errors (which are essentially fragmented Ethernet frames that aren’t correctly reassembled).

Inside ESX, jumbo frames must be configured on the physical NICs, on the vswitch and on the vmkNICs used by iSCSI. The physical uplinks and vswitch are set by configuring the MTO of the vswitch. Once this is set, any physical NICs that are capable of passing jumbo frames are also configured. For iSCSI, the vmkNICs must also be configured to pass jumbo frames.

Unfortunately, the vSwitch and the vmkNICs must be added (or, if already existing, removed and re-created) from the command line to provide jumbo frame support: Note this will disconnect any active iSCSI connections so this should be done as a maintenance operation while VMs residing on the Datastores/RDMs are running on other ESX hosts. (I know this sounds like an “of course” but just a good warning).

Below is an example:

# esxcfg-vmknic --server -l|cut -c 1-161

Interface Port Group/DVPort IP Family IP Address Netmask Broadcast MAC Address MTU TSO MSS Enabled Type

vmk1 iSCSI2 IPv4 10.11.246.51 255.255.255.0 10.11.246.255 00:50:56:7b:00:08 1500 65535 true STAT

vmk0 iSCSI1 IPv4 10.11.246.50 255.255.255.0 10.11.246.255 00:50:56:7c:11:fd 9000 65535 true STAT

# esxcfg-vmknic --server -d iSCSI2

# esxcfg-vmknic --server -a -i 10.11.246.51 -n 255.255.255.0 -m 9000 iSCSI2

# esxcfg-vmknic --server -l|cut -c 1-161

Interface Port Group/DVPort IP Family IP Address Netmask Broadcast MAC Address MTU TSO MSS Enabled Type

vmk0 iSCSI1 IPv4 10.11.246.50 255.255.255.0 10.11.246.255 00:50:56:7c:11:fd 9000 65535 true STAT

vmk1 iSCSI2 IPv4 10.11.246.51 255.255.255.0 10.11.246.255 00:50:56:7b:00:08 9000 65535 true STAT

If the vmkNICs are already set up as iSCSI Multipath vmkNICs, you must remove them from the iSCSI configuration before deleting them and re-adding them with the changed MTU.

Fifth topic: Delayed ACK

Delayed ACK is a TCP/IP method of allowing segment acknowledgements to piggyback on each other or other data passed over a connection with the goal of reducing IO overhead.

clip_image021

clip_image023

If your storage system is capable of supporting delayed ACK, verify with your vendor if delayed ACK should be enabled.

Sixth topic: other configuration recommendations:

Most of the original multivendor iSCSI post “general recommendations” are as true as ever. When setting up the Ethernet network for iSCSI (or NFS datastores) use – don’t think of it as “it’s just on my LAN”, but rather “this is the storage infrastructure that is supporting my entire critical VMware infrastructure”. IP-based storage needs the same sort of design thinking traditionally applied to FC infrastructure – and when you do, it can have the same availability envelope as traditional FC SANs. Here are some things to think about:

  • Are you separating you storage and network traffic on different ports? Could you use VLANs for this? Sure. But is that “bet the business” thinking? It’s defensible if you have a blade, and a limited number of high bandwidth interfaces, but think it through… do you want a temporarily busy LAN to swamp your storage (and vice-versa) for the sake of a few NICs and switch ports? So if you do use VLANs, make sure you are thorough and implement QoS mechanisms. If you’re using 10GbE using VLANs can make a lot of sense and cut down on your network interfaces, cables, and ports, sure – but GbE – not so much.
  • Think about Flow-Control (should be set to receive on switches and transmit on iSCSI targets)
  • Either disable spanning tree protocol (only on the most basic iSCSI networks) – or enable it only with either RSTP or portfast enabled. Another way to accomplish this if you share the network switches with the LAN, you can filter / restrict bridge protocol data units on storage network ports
  • If at all possible, use Cat6a cables rather than Cat5e (and don’t use Cat5). Yes, Cat5e can work – but remember – this is “bet the business”, right? Are you sure you don’t want to buy that $10 cable?
  • Things like cross-stack Etherchannel trunking can be handy in some configurations where iSCSI is used in conjunction with NFS (see the “Multivendor NFS post” here)
  • Each Ethernet switch also varies in its internal architecture – for mission-critical, network intensive Ethernet purposes (like VMware datastores on iSCSI or NFS), amount of port buffers, and other internals matter – it’s a good idea to know what you are using.

In closing.....

We would suggest that anyone considering iSCSI with vSphere should feel very confident that their deployments can provide high performance and high availability. You would be joining many, many customer enjoying the benefits of VMware and advanced storage that leverages Ethernet.

With the new iSCSI initiator, the enablement of multiple TCP sessions per target, and the multipathing enhancements in vSphere ESX 4 it is possible to have highly availabily and high performing storage using your existing Ethernet infrastructure. The need for some of the workarounds discussed here for ESX 3.5 can now be parked in the past.

To make your deployment a success, understand the topics discussed in this post, but most of all ensure that you follow the best practices of your storage vendor and VMware.

Restarting vmware-hostd

Today I had an issue here an ESX host became unresponsive in vCenter, yet the VMs that were running on the host were fine. The normal remedy for this issue would be to restart the management agent on the ESX host via the Service Console:

/etc/init.d/mgmt-vmware restart

However, this did not work. The mgmt-vmware restart command hung while stopping the "VMware ESX Server Host Agent". Ten minites after executing mgmt-vmware restart, I decided to break out of the process by pressing Ctrl+z.

Clearly, there was a problem with the existing running instance of the management agent, vmware-hostd. The only way to get this working without a host reboot, is to find the PID for vmware-hostd and kill it:

To locate the PID for the running vmware-hostd process execute:

ps -auxwww |grep vmware-hostd

You will see output similar to: (I've marked the PID in BOLD text)

root 13089 1.3 2.6 179080 6988 ? S 2008 1695:23 /usr/lib/vmware/hostd/vmware-hostd /etc/vmware/hostd/config.xml -u

To kill the running process, execute:

kill -9 (I had to run "kill -9 13089")

Once vmware-hostd is no longer running, you can restart the management agent by running:

/etc/init.d/mgmt-vmware restart

-or-

service mgmt-vmware restart

Using Custom Attributes to Manage VMs

This may not be the most technical post, but it should hopefully give VM administrators some ideas on managing their VMs.

Despite having tools like VirtualCenter, keeping track of your VMs can still be a mission. Today I look after thousands of virtual machines running on hundreds of ESX hosts in several data centres. Most of these VMs are production systems, some are clones of production systems, some are test and some are dev. Creating and managing machines for new services is not always an issue. We have processes in place to control VM sprawl. We know which VMs belongs to which customers. We also know who to contact in regards to which VM. This is all documented in change records and CMDBs. However, having to go back to CMDBs and change records every time you need to know who owns a VM is a bit of a slog. Sure we’ve tried adding relevant information into the “Notes” Attribute, but it gets messy and some administrators “forget” to add all the information we need into the notes.

To try and keep track of who owns what, I use a simple but very effective tool inside vCenter to manage VMs. It’s the “Custom Attribute” function of vCenter that allows administrators to specify custom attributes for all the VMs and hosts in vCenter. Custom Attributes are by no means a new feature in Virtual Infrastructure or vSphere, yet a lot of administrators don’t use them as they simply don’t realise that custom attributes functions exists or what custom attributes are for. I’ve seen many virtual Infrastructures built on VMware VI 3 (small environments to large enterprise environments) and I simply can’t recall ever seeing custom attributes being used.

What are custom attributes?

Custom Attributes are attributes that a administrator can define for all Virtual Machines and Hosts in vCenter. The attributes are then displayed in the “Annotations” box for each VM. The custom attributes are also displayed alongside the VM Name, Status, State, CPU Usage, etc, when listing virtual machines in the Datacenter, cluster, host, and resource pool views.

Why use them?

Daily, we receive lots of clone requests for VMs. Clones are very good for application development and application troubleshooting. When an application on a VM goes pear shaped, the application vendor may request a clone for a system. They can then work on the clone system to try and fix some issues while the production system (if not broken completely) may in some cases still remain partially available to the end users. Once they are confident that they have resolved the issue, they can then implement the fix in the production environment.

This is all good and well, but here’s the problem. Clones are a drain on your resources. As administrator, you should be very careful to simply hand out clones as the requests come in. Virtual Machine clones can create a nightmare for administrators as they use huge amounts of additional disk space. Some of the clones we have today are in excess of 420GB. When creating clones, we need to make sure that the following information is recorded somewhere in vCenter:

· Who requested the clone;

· The requestor’s contact number/email address

· The date that the clone was made;

· The size of the clone on disk (all vDisk sizes combined + the amount of RAM assigned to the VM);

· Does it need to be backed up (Yes / No);

· Reason for the clone;

Specifying your own Custom Attributes

Creating custom attributes is very simple. In the VI (or vSphere) client on the menu bar, click “Administration” >> “Custom Attributes...”

You’ll now be presented with the Custom Attributes dialog:

Image
Create your attributes by using the Add button. I created Virtual Machine attributes for this post.

Once you’ve created your attributes, you’ll see them in each VM’s annotation’s box on the summary page.

Image
To edit the custom attributes, click

When you click on the "edit" label in the annotations box, you get:

Image
Edit Attributes Dialog

Fill out the information for each attribute:

Image
Attributes filled out

When looking at the annotations box on the summary page of the VM, the custom attributes will be recorded and displayed for that VM:

Image
Annotations is displaying custom attribute information

The cluster view will also now display all the custom attributes. They can be filtered like any other VM attribute:

Image
Cluster View of custom attributes

Which ESX host is locking my files?

I’ve found myself asking this very annoying question just last week again. Which one of the servers is holding a lock on a virtual machine log file that was last modified 3 months ago?

Last week I came across a problem where VCB failed a job while trying to perform a full backup of one of the VMs. This was because one of the log files for the Virtual Machine was locked on the SAN. VCB was therefore unable to copy the log file to the backup server and therefore failed the entire job.

Normally, a simple VMotion of the Virtual Machine to another host will solve this issue, but I wasn’t as lucky this time. So I thought powering off the VM will do it... Didn’t work! No matter what I did, I just couldn’t get the lock released on that file. One of the ESX hosts in the cluster was holding on to the log file, but how do I go about finding out which one of the 20 ESX hosts is was? To me, this sounded like a job for vmkfstools, and indeed it was. Well, sort off. Using vmksftools, I was able to retrieve the MAC address of the ESX host in the cluster that was holding on to the 3 month old log file.

The command is:

vmkfstools –D /filename

In my case this was;

vmkfstools –D /vmfs/volumes/iscsi-002-vmfs/WKSTN01/vmware.log

The output is then written to /var/log/vmkernel.

To get the output, simply do:

tail /var/log/vmkernel

This returned:

Jun 20 15:35:33 esx1 vmkernel: 23:02:22:35.020 cpu0:4174)FS3: 142:
Jun 20 15:35:33 esx1 vmkernel: 23:02:22:35.020 cpu0:4174)Lock [type 10c00001 offset 29190144 v 7, hb offset 4083712
Jun 20 15:35:33 esx1 vmkernel: gen 1881, mode 1, owner 4a2128d2-86a81c3a-ce30-000e0cc41e98 mtime 893]
Jun 20 15:35:33 esx1 vmkernel: 23:02:22:35.020 cpu0:4174)Addr , gen 6, links 1, type reg, flags 0x0, uid 0, gid 0, mode 644
Jun 20 15:35:33 esx1 vmkernel: 23:02:22:35.021 cpu0:4174)len 312433, nb 1 tbz 0, cow 0, zla 1, bs 1048576
Jun 20 15:35:33 esx1 vmkernel: 23:02:22:35.021 cpu0:4174)FS3: 144:

The MAC address of the host locking the file is reported in line 3:

000e0cc41e98

Now, this is the bit where I can’t make it any easier for you. Unless you write a script, (and I don’t have that much time at the moment) the only way to find the host with that MAC is to log onto each host via SSH and run:

esxcfg-info |grep –i ‘system uuid’

This will then return the UUID for the host you are on. If it matches the MAC retrieved using vmkfstools, then you know the process that’s keeping the lock is on that server.

So what process is locking the file? That I can’t tell you. I can only give you some tips as to how to find it.
1. Power off the VM in vCenter;
2. Log onto the service console of the host that’s locking the file;
3. Try to move or delete the lock file from the service console of the locking host. This worked me. If it works for you, then good. If not, go to step 4;
4. Try and see if there’s a process running with the filename that is locked;

ps –auxwww |grep

If it returns a line(other than the grep line) kill the process with “kill -9 "

5. If it doesn’t return any processes under that filename, then try and search for a PID with the VM name that has a locked file:

ps –auxwww|grep

If it returns a PID, kill the PID, as your VM was already powered off in step one and should therefore not have a PID on any host;

6. If it still doesn’t work, leave a comment and we'll have a look at it ;-)

Stuck task on VM: VI3/vSphere Virtual Machine Operations

This is by no means a new issue. However, I still get support calls regarding tasks that get stuck on VMs. What do I mean by "stuck tasks"? Well, I've seen cases where a snapshot task initiated by VCB got stuck in the state of "Creating Virtual Machine Snapshot". Then VM goes down and cannot be accessed via the console, does not respond to pings, and the status of VMtools turns to "Unknown". You also cannot do "Power On" on the VM either as the "Creating Virtual Machine Snapshot" task is still showing as an active task. You can wait, but after 30 minutes, chances are that it won't sort itself out, so user intervention is required!

This is normally the approuch I take to sort this out:
1. Make sure that the VM is inaccessible to everyone and that it really is down.

2. Browse the datastore where the VM is located (best to do this via the CLI on the service console with "ls -lh") and check the time stamps of the files to see how log the snapshots, if any,have been sitting there for.
3. in VirtualCenter, or "vCenter" the VM will probably still be showing as powered on. Check on which of your ESX hosts it is running.
4. Log onto the service console of the ESX host that is running the VM. Elevate your priviledges to root.
5. Now, as the VM has an active task, you won't be able to send any other commands to the VM. You won't be able to use vmware-cmd to change the state of the VM either. Until the task that's stuck in progress has completed, the ESX host will not be able to send any power commands to the VM. The only way to now release the VM from it's sorry state and get rid of the "Active task" is to kill the VM's running process from the service console. In order to do so, you need to find the PID for the "running" VM. To get the PID do:

The Syntax is:
ps -auxwww |grep

Example:
Suppose you have a VM called WKSTNL01 The command will be:
ps -auxwww |grep WKSTNL01

This should return something like this:

root 12322 0.0 0.4 3140 1320

The PID in this instance is 12322. This is what we need to kill.

6. Kill the process ID with kill -9:

kill -9 12322


7. Delete any snapshots created

8. Power On the VM.

Friday, May 27, 2011

The top 25 ESX commands and ESXi commands

As every virtualization administrator knows, getting a handle on a VMware infrastructure requires greater automation of key virtualization management tasks. While VMware ESX hosts can be managed with the vSphere client graphical user interface, data center administrators often prefer to log into the VMware service console and use the ESX command line to troubleshoot problems such as network configuration or to re-configure a host. And there are several VMware commands to help automate such tasks, identify problems in your virtualization infrastructure, performance tune your virtual machines (VMs) and more.

In this guide to VMware command lines, I outline the top 25 most useful ESX Server commands and ESXi commands. They include Linux as well as ESX-specific commands, and many can be used with the Remote Command Line Interface (RCLI), which in vSphere has been renamed the vSphere CLI and can be used with VMware ESX and ESXi.

The top Linux commands
The ESX service console is based on Red Hat Linux and therefore many Linux commands can be used inside it. Here are common Linux commands and some VMware-specific versions of them.

  • Find/cat/grep commands find, display and search for files. Find locates specific files, cat displays the contents of files and joins files together, and grep searches for specific text within in a file. These commands help administrators find specific infrastructure elements such as snapshot files and also display log and config files. They can also search for information within files.
  • Tail displays the last part of a text-based file and can also monitor output to the file in real time. This command helps monitor log files in real time.
  • Service can start, stop and restart services (or programs) that run on the host server. Some common ESX services include mgmt-vmware, vmware-vpxa, firewall, vmware-hostd and vmware Web Access. This command can restart services that hang or following configuration changes.
  • Nano and vi edit text files. Nano is a simpler and easier to use editor than is vi, but vi is a more featured and powerful editor. Text editors help edit configuration files on an ESX host.
  • Su and sudo commands help control access and prevent the root account from being used. Su elevates the privileges of less privileged user accounts to that of a superuser (or root). Sudo runs commands as another user as specified in the sudoers configuration file.
  • ls lists file and directory information. By using certain switches (e.g., ltr) you can display detailed file information, including the owner, size, permissions and the last modified date andtime.
  • Df and vdf display file system (partitions) information, including free space. The df command will not display Virtual Machine File System (VMFS) volumes because it cannot read them. Vdf is the VMware version of this command, which will also display VMFS volume information. Both commands can use the H switch, which displays the output in readable form (i.e., as 2 GB rather than as 2,016,044).
  • Ps and kill commands can forcibly terminate stuck VMs that will not power on or off. Ps displays the status or processes running on the host. It can use many switches, but the most common is EF, which displays full information about every process running. The kill command is often used with the ps command to terminate specific running processes.
  • Ping and vmkping are the most basic network troubleshooting commands. Ping tests network connectivity with other hosts and network devices by sending an Internet Control Message Protocol packet to them and seeing the response. Vmkping is the VMware-specific version of the Ping command. It uses the IP stack of the VMkernel to ping another ESX host's VMkernel port. This command helps troubleshoot VMotion and network storage issues.

The top VMware ESX commands and ESXi commands
These VMware ESX and ESXi commands can be run with the ESX service console (locally or remotely using Secure Shell) or with RCLI (in VMware Infrastructure 3) and vSphere CLI (in vSphere). With the RCLI and vSphere CLI, note that many of the commands have been renamed to vicfg- instead of esxcfg- (i.e., esxcfg-nics.pl and vicfg-nics.pl). Both commands perform the same function, but VMware is trying to migrate from esxcfg- to vicfg-.

  • The versatile vmkfstools command is the Swiss army knife of virtual disks and can be used to copy, convert, rename, import, export and resize virtual disk files.
  • Esxtop troubleshoots performance problems. It provides real-time and historical performance statistics for CPU, memory, disk and network usage.
  • Esxcfg-nics views and configures physical network interface cards (NICs). It displays NIC status and can configure speed and duplex of the NICs.
  • Esxcfg-vswitch views and configures virtual switches. It's useful for configuring networking when the vSphere Client cannot be used. The command configures port groups and links physical NICs to them (known as uplinks) anad configures virtual LAN IDs, Cisco Discovery Protocol (CDP) and the MTU of vswitches.
  • Esxcfg-vswif and esxcfg-vmknic allow you to view and configure special port groups on vSwitches. Esxcfg-vswif configures the ESX service console network interfaces, which are also known as vswif ports. Esxcfg-vmknic configures VMkernel network interfaces, which are necessary for VMotion and connecting to iSCSI and Network File System network storage devices.
  • Vmware-cmd is a versatile command to manage and retrieve information from virtual machines. It can change VM power states, manage snapshots, register and unregister VMs, and retrieve and set various VM information.
  • Vimsh and vmware-vim-cmd are complex commands that you should fully understand before using. Vimsh is a powerful interactive shell that allows execution of commands and the ability to display and configure many things. VMware-vim-cmd is a front end of sorts for vimsh that simplifies command usage without having to know the many switches that vimsh requires.
  • Vihostupdate and esxupdate update and patch ESX and ESXi hosts. Esxupdate is used on the ESX service console and vihostupdate is used by the RCLI/vSphere CLI. In addition, vihostupdate35 is used to patch ESX and ESXi version 3.5 hosts.
  • Svmotion is an RCLI/vSphere CLI command used to initiate Storage VMotion sessions to relocate a VM's virtual disk to another datastore while it is running. In ESX 3.5 this command was the only method to initiate a SVMotion; in vSphere the ability to do this was added to the vSphere Client GUI.
  • Esxcfg-mpath displays and sets all paths from a host to its storage devices.
  • Esxcfg-rescan lets a host res-can a particular storage adapter to discover new storage devices. This tools is useful when storage devices have been added, removed or changed from a storage network.
  • Esxcfg-scsidevs and esxcfg-vmhbadevs display information on the storage devices connected to a host. Esxcfg-vmhbadevs was used in ESX 3.5 and was replaced by esxcfg-scsidevs in vSphere.
  • Esxcfg-firewall displays information and configures the built-in firewall that protects the ESX service console. It allows and blocks specific TCP/IP ports between the service console and other network devices.
  • The esxcfg-info command provides a wealth of information about the host that it is run on. It can be re-directed to a text file to document host configuration.
  • Esxcfg-auth configures Service Console authentication on an ESX host. It can configure authentication to a third-party LDAP or Active Directory server and set various local security options.
  • Vm-support is a powerful information gathering tool commonly used in troubleshooting. The command gathers up a large amount of configuration info, log files and the output from many commands into a single .tgz archive file. It can also be used to display VM information as well as kill VMs that are not responding.

Most of the above commands have various syntaxes, options and switches that can be used with them. With them you can usually run a command without any options. For more information on these 25 ESX and ESXi commands, check out the following documentation: