VMGuru

Today I attended session VSP1682 ‘VMware vSphere clustering Q&A’ hosted by Frank Denneman, Duncan Epping and Chris Colotti.

After a short introduction the Q&A started and below you will find my top 10 questions.

Q1. Are the old, vSphere 4, constraints in vSphere 5 still current?

Until vSphere 5 the best practice is a maximum of 8 hosts in a cluster, because of linked clones in VMware View and the primary/secondary ESX(i) hosts setup in an HA cluster. In vSphere 5, VMware changed this to a master/slave setup. When the master ESXi host goes offline a new master is elected within 15 sec. So, the cluster boundary limits VMware vSphere had in the past are gone. This is a huge advantage of vSphere 5.0.

Q2. With the improvements in vSphere 5 and HA, are the advanced HA settings all gone?

No, all the advanced HA settings still exist with the exception of the failure detection time setting. The failure detection time setting was used in vSphere 4.1. What happened was, when an isolation occurred based on network isolation, a single host would be isolated, after 15 seconds by default it would start shutting down virtual machines. With vSphere 5.0 that concept doesn’t exist anymore. There is a failure detection time, but it is build in, you can’t change it anymore.

After this question, Duncan wants to clarify some misinformation regarding datastore heartbeat.

The datastore heartbeat is only used to confirm an ESXi host isolation. So, the datastore heartbeat is not used when VMware vSphere is in its normal mode of operation. It works by placing a file on a datastore which is touched every 5 seconds. When one of the hosts detects an isolated host, HA will try to validate the isolation by either touching the file (NFS) or checking if the host has the file open (VMFS).

Q3. How does HA work in a mixed 3.5/4.x/5.0 cluster?

HA works by installing an agent on the host. This is not different for VMware VI3.5 or vSphere 4.x, 5.0 enabling HA will install an agent to all versions and HA will work. There’s one important note though, you will have to have a certain patch level on ESX(i) 3.5 to support that.

Q4. What are conditions that prevent DPM from powering down a host?

DPM looks at the amount of servers configured for HA fail over capacity. It’s not allowed to violate that setting. Second item DPM looks at, is the amount of reserved resources. The reservations for all running virtual machines has to be honored. DPM cannot reduce the amount of reserved resources in a cluster.

Q5. What is the formula/mechanism DRS uses to detemine which VMs to move to other hosts?

If ESX wants to balance the cluster, DRS inspects the reservations and resource entitlements. DRS than picks the virtual machine with ’the biggest bang for a buck’ . DRS chooses the virtual machine it can move with the least amount of effort to restore the balance in the cluster. To ensure the least amount of effort to balance the cluster. It doesn’t have to be the biggest virtual machine, DRS uses a cost/benefit/risk analysis. So, what is the cost of the vMotion, what is the risk and what is the benefit, how many resources will it free up and how will it influence the resource utilization on the destination host.

It does not do a combined vMotion, it only decides which virtual machine needs to go where to solve the resource contention.

Q6. How does DRS way CPU and memory?

It depends on what is most used in the cluster, what is the critical resource.

Q7. A customer rebooted their vCenter server and this caused an host isolation response. What could have caused that (ESX3.5)?

The network heartbeat works by primary nodes pinging all the secondary nodes and all secondary nodes ping the primary nodes and the primary nodes ping themselves as well. A vCenter server restart should not interfere with that process even if pinging your isolation address fails, nothing will happen unless the heartbeat itself fails. Because that’s the only time HA will execute a ping to the isolation address. result in an isolation response.

So restarting a vCenter server should not execute an isolation response, there’s probably an other issue causing this.

Q8. How does storage vMotion/DRS impact deduplication and what are the recommendations?

When storage DRS/vMotion moves a virtual machine to another LUN it is basically new data so it will impact deduplication in a negative way. That’s why VMware recommends to set it to manual and determine if it works for your environment and what the impact is on the deduplication ratio.

The same goes for storage DRS/vMotion and storage replication, set storage DRS to manual. Only use storage vMotion for virtual machines which may suffer a small amount of downtime because just after the storage vMotion, the replication has to take place and the virtual machine is unprotected for a short time because replication is still taking place.

One of the coolest effects of storage DRS is the reduction of management needed for managing thin provisioned disks. Because storage DRS uses the data growth rate and will try to predict how a vmdk file will grow during a period of time, which usually is 8 hours. If it detects that a disk will grow beyond the threshold that has been set, it will try to move the vmdk file to another datastore.

Q9. What is the impact of using resource pools as folders?

Problem is when you use resource pools as folder, is impacting the resources given to a virtual machine even when not changing the shares setting. Resource management uses to terms, resource providers and resource consumers and resource pools are both. It consumes resources from its parent, usually the cluster, and it provides resources to its consumers, the virtual machines. If you create resource pools as folders, you will impact the resources available to a virtual machine during contention.

A resource pool will get resources from its parent and it will divide it between its resource consumers (virtual machines or other resource pools). Resources are then divided across the resource consumers according to the share settings. When VMs are battling for resources and there’s contention, the resources are divided based on resource pools share settings and inside the resource pool based on VM share settings. But within the resource pool VMware can only share the resources it gets from its parent, the resource pool/folder. It is recommended is to setup shares on rsource pools based on the number of VMs in the resource pools.

See the example below to get an idea how this works and how unnecessary folders/resource pools impact performance.

Q10. Should You use storage DRS and SRM in a single environment?

The answer is fairly simple: NO. Storage DRS is currently not supported with SRM because you will encounter the scenario where virtual machines become unprotected because they were moved to another datastore which might not be protected with SRM. VMware is working on supporting this combination in the future, probably with the next major release of vSphere.