VMware HA protects application in two ways:
- Host failure detection, and restarting VM to another host in the cluster
- VM monitoring (heartbeat and I/O stats). Resetting the VM when guest OS failure detected
- Build the cluster out of identical server hardware
- Ideal cluster size the the majority environments is 6 ~10 nodes: simple with spare capacity
- First 5 hosts joining the cluster will be primary hosts, and all the subsequent hosts are secondary.. At least one primary host must be functional for VMware HA to operate correctly.
- One of the primary hosts is also designated as the active primary host:
- Deciding where to restart virtual machines
- Keeping track of failed restart attempts
- Determining when to keep trying to restart a VM
- entering maintenance mode: secondary host will be promoted as primary
- powering off and failure
- host isolation ?
Network design consideration:
- increasing resiliency of client side networking to ensure access from external system to workloads running in vSphere
- Increasing resiliency of communications used by VMware HA itself.
- If physical switches to servers support PortFast or equivalent, then enable it.
- Disable host monitoring when performing any networking maintenance to avoid host isolation.
- User of a documented scheme for port group naming on VLANs
- HA will open ports on hosts both incoming and outgoing
- NIC teaming for HA
- enable reliable host failure detection and prevent host isolation condition from occurring
- NIC teaming (Active/Passive with no fail back) is sufficient (Connecting to two physical switches is recommended)
- Redundant management network is a new level of redundancy from host.
- Multipathing for failover and load-balancing
- detection: at least one additional isolation address, such as router address
- response (shutdown, power off, remain running), though redundant management network is still preferred
- VMWare HA will automatically power off the the VM that has lost access to its VMFS files to avoid Split-Brian Scenarios
- Host isolation responses are not performed (skip)on VM with Fault Tolerance enabled.
- Select the "percentage of cluster resources reserved" for admission control. 1/N
- ensure all cluster host are sized equally.
- Avoid resource fragmentation, VM sizing, with the help from DRS
- limit the user the VM-Host affinity rules
No comments:
Post a Comment