Thursday, April 5, 2012

HA Best practice guide

http://www.vmware.com/files/pdf/VMwareHA_twp.pdf

VMware HA protects application in two ways:
  1. Host failure detection, and restarting VM to another host in the cluster
  2. VM monitoring (heartbeat and I/O stats). Resetting the VM when guest OS failure detected
Host Selection consideration:
  1. Build the cluster out of identical server hardware
  2. Ideal cluster size the the majority environments is 6 ~10 nodes: simple with spare capacity
Host placement consideration:
  1. First 5 hosts joining the cluster will be primary hosts, and all the subsequent hosts are secondary.. At least one primary host must be functional for VMware HA to operate correctly.
  2. One of the primary hosts is also designated as the active primary host:
  • Deciding where to restart virtual machines
  • Keeping track of failed restart attempts
  • Determining when to keep trying to restart a VM
     3.  Primary host becomes unavailable:
  • entering maintenance mode: secondary host will be promoted as primary
  • powering off and failure
  • host isolation ?
If all the primary hosts fail, the cluster loses VMware HA.protection. In order to prevent single point of failure, VMware recommends that no more than 4 host in a given cluster are placed in same rack or blade chassis, since the primary role might shift around over time.

Network design consideration:
  1. increasing resiliency of client side networking to ensure access from external system to workloads running in vSphere
  2. Increasing resiliency of communications used by VMware HA itself.
General networking guideline
  • If physical switches to servers support PortFast or equivalent, then enable it.
  • Disable host monitoring when performing any networking maintenance to avoid host isolation.
  • User of a documented scheme for port group naming on VLANs
  • HA will open ports on hosts both incoming and outgoing
  • NIC teaming for HA
Setting up redundancy for VMware HA networking
  • enable reliable host failure detection and prevent host isolation condition from occurring
  • NIC teaming (Active/Passive with no fail back) is sufficient (Connecting to two physical switches is recommended)
  • Redundant management network is a new level of redundancy from host.
Storage Design Consideration
  • Multipathing for failover and load-balancing
Host Isolation
  • detection: at least one additional isolation address, such as router address
  • response (shutdown, power off, remain running), though redundant management network is still preferred
  • VMWare HA will automatically power off the the VM that has lost access to its VMFS files to avoid Split-Brian Scenarios
  • Host isolation responses are not performed (skip)on VM with Fault Tolerance enabled.
Admission Control best practice
  •  Select the "percentage of cluster resources reserved" for admission control. 1/N
  • ensure all cluster host are sized equally.
  • Avoid resource fragmentation, VM sizing, with the help from DRS
  • limit the user the VM-Host affinity rules

No comments:

Post a Comment