vStuff: HA Best practice guide

http://www.vmware.com/files/pdf/VMwareHA_twp.pdf

VMware HA protects application in two ways:

Host failure detection, and restarting VM to another host in the cluster
VM monitoring (heartbeat and I/O stats). Resetting the VM when guest OS failure detected

Host Selection consideration:

Build the cluster out of identical server hardware
Ideal cluster size the the majority environments is 6 ~10 nodes: simple with spare capacity

Host placement consideration:

First 5 hosts joining the cluster will be primary hosts, and all the subsequent hosts are secondary.. At least one primary host must be functional for VMware HA to operate correctly.
One of the primary hosts is also designated as the active primary host:

Deciding where to restart virtual machines
Keeping track of failed restart attempts
Determining when to keep trying to restart a VM

3. Primary host becomes unavailable:

entering maintenance mode: secondary host will be promoted as primary
powering off and failure
host isolation ?

If all the primary hosts fail, the cluster loses VMware HA.protection. In order to prevent single point of failure, VMware recommends that no more than 4 host in a given cluster are placed in same rack or blade chassis, since the primary role might shift around over time.

Network design consideration:

increasing resiliency of client side networking to ensure access from external system to workloads running in vSphere
Increasing resiliency of communications used by VMware HA itself.

General networking guideline

If physical switches to servers support PortFast or equivalent, then enable it.
Disable host monitoring when performing any networking maintenance to avoid host isolation.
User of a documented scheme for port group naming on VLANs
HA will open ports on hosts both incoming and outgoing
NIC teaming for HA

Setting up redundancy for VMware HA networking

enable reliable host failure detection and prevent host isolation condition from occurring
NIC teaming (Active/Passive with no fail back) is sufficient (Connecting to two physical switches is recommended)
Redundant management network is a new level of redundancy from host.

Storage Design Consideration

Multipathing for failover and load-balancing

Host Isolation

detection: at least one additional isolation address, such as router address
response (shutdown, power off, remain running), though redundant management network is still preferred
VMWare HA will automatically power off the the VM that has lost access to its VMFS files to avoid Split-Brian Scenarios
Host isolation responses are not performed (skip)on VM with Fault Tolerance enabled.

Admission Control best practice

Select the "percentage of cluster resources reserved" for admission control. 1/N
ensure all cluster host are sized equally.
Avoid resource fragmentation, VM sizing, with the help from DRS
limit the user the VM-Host affinity rules

vStuff

Thursday, April 5, 2012

HA Best practice guide

No comments:

Post a Comment