vSphere 5 memory management explained (part 2)
As I said earlier this week, VMware memory management is still a topic which a lot of VMware administrators don’t understand.
Tuesday I discussed the virtual machine memory allocation graphs. Today we will deal with VMware vSphere uses transparent page sharing (TPS), memory compression, host swapping and ballooning.
VMware ESXi, a crucial component of VMware vSphere 5.0, is a hypervisor designed to efficiently manage hardware resources including CPU, memory, storage, and network among multiple, concurrent virtual machines. In this article I will describes the basic memory management concepts in VMware ESXi and describe the performance impact of these options.
ESXi uses several innovative techniques to reclaim virtual machine memory, which are:
- Transparent page sharing (TPS)—reclaims memory by removing redundant pages with identical content;
- Ballooning—reclaims memory by artificially increasing the memory pressure inside the guest;
- Hypervisor swapping—reclaims memory by having ESXi directly swap out the virtual machine’s memory;
- Memory compression—reclaims memory by compressing the pages that need to be swapped out.
So how does it work.
Transparent Page Sharing (TPS)
Running multiple virtual machines on a single piece of hardware results in identical sets of memory pages. The amount of identical pages is influenced by the number of virtual machines and the (lack of) variation of operating systems. The identical memory pages enable VMware to implement memory sharing across virtual machines. Page sharing enables the hypervisor to reclaim redundant page copies and keep only one copy, which is shared by multiple virtual machines in the host physical memory. This results in a much lower host memory consumption and a high level of memory overcommitment.
TPS is a default ESXi feature which runs regardless of the amount of used physical memory. TPS is turned on by default, you can only disable it by modifying the ESXi advanced settings but I would strongly advise you not to do that. TPS can save you up to 70% (VDI environments with many identical operation systems), space which you can use to increase your consolidation ratio.
TPS is a memory management technique which is transparent for the virtual machine and it includes no performance penalty.
Due to the virtual machine’s isolation, the guest operating system is not aware that it is running inside a virtual machine and is not aware of the states of other virtual machines on the same host. When the hypervisor runs multiple virtual machines and the total amount of the free host memory becomes low, none of the virtual machines will free guest physical memory because the guest operating system cannot detect the host’s memory shortage. Ballooning makes the guest operating system aware of the low memory status of the host.
VMware ESXi uses the ballooning driver, which is included in the VMware Tools, to enable ballooning. This driver has no external interfaces to the guest operating system and only communicates with the hypervisor through a private channel through which it polls the hypervisor to obtain a target balloon size to reclaim memory. As a result, the hypervisor offloads some of its memory overload to the guest operating system while slightly loading the virtual machine. That is, the hypervisor transfers the memory pressure from the host to the virtual machine. Ballooning induces guest memory pressure. In response, the balloon driver allocates and pins guest physical memory. The guest operating system determines if it needs to page out guest physical memory to satisfy the balloon driver’s allocation requests. If the virtual machine has plenty of free guest physical memory, inflating the balloon will induce no paging and will not impact guest performance.
So if a ESXi host runs into a memory shortage, it requests the virtual machines to free up virtual memory which in case results in reclaimed physical memory. Virtual machines will be asked to free up memory which can be used by virtual machines requesting additional memory.
When memory reclamation/ballooning does not have the desired effect, ESXi uses the next memory management technique in the chain, memory compression. Memory compression moves memory pages to a separate cache which is located in the host’s main memory. ESXi determines if a page can be compressed by checking the compression ratio for the page. Memory compression occurs when the page’s compression ratio is greater than 50%. Otherwise, memory compression has no added value and the page is swapped out. Only pages that would otherwise be swapped out to disk are chosen as candidates for memory compression.
Memory compression only occurs when there’s a host memory shortage and ballooning has not achieved the desired effect. ESXi will not proactively compress memory pages when host memory is undercommitted.
Memory compression is somewhat comparable to swapping but instead of moving memory pages to disk, memory page are moved to a reserved memory location. Because memory access times are much faster than disk access times, memory compression outperforms host swapping.
Memory compression is turned on by default, you can only disable it by modifying the ESXi advanced settings but I would strongly advise you not to do that
When transparent page sharing, ballooning and memory compression do not have the desired effect, ESXi uses it’s last resort, hypervisor swapping. Hypervisor swapping moves the a guest’s memory pages to a virtual machine based swap file (.vswp), which frees host physical memory for other virtual machines.
Both page sharing and ballooning take time to reclaim memory. The page-sharing speed depends on the page scan rate and the sharing opportunity. Ballooning speed relies on the guest operating system’s response time for memory allocation. Hypervisor swapping is a guaranteed technique to reclaim a specific amount of memory within a specific amount of time. However, hypervisor swapping is used as a last resort to reclaim memory from the virtual machine because it has a huge performance impact.
Host free memory states
ESXi maintains four host free memory states: high, soft, hard, and low, which are reflected by four thresholds. The threshold values are calculated based on host memory size. The figure below shows how the host free memory state is reported in ESXTOP. The ‘minfree‘ value represents the threshold for the high state. By default, ESXi enables page sharing since it opportunistically reclaims host memory with little overhead. When to use ballooning or swapping (which activates memory compression) to reclaim host memory is largely determined by the current host free memory state.
In the high state, the aggregate virtual machine guest memory usage is smaller than the host memory size. Whether or not host memory is overcommitted, the hypervisor will not reclaim memory through ballooning or swapping unless the virtual machine memory limit is set.
If host free memory drops towards the soft threshold, the hypervisor starts to reclaim memory using ballooning. Ballooning happens before free memory actually reaches the soft threshold because it takes time for the balloon driver to allocate and pin guest physical memory. Usually, the balloon driver is able to reclaim memory in a timely fashion so that the host free memory stays above the soft threshold.
If ballooning is not sufficient to reclaim memory or the host free memory drops towards the hard threshold, the hypervisor starts to use swapping in addition to using ballooning. During swapping, memory compression is activated as well. With host swapping and memory compression, the hypervisor should be able to quickly reclaim memory and bring the host memory state back to the soft state.
In a rare case where host free memory drops below the low threshold, the hypervisor continues to reclaim memory through swapping and memory compression, and additionally blocks the execution of all virtual machines that consume more memory than their target memory allocations.
In certain scenarios, host memory reclamation happens regardless of the current host free memory state. For example, even if host free memory is in the high state, memory reclamation is still mandatory when a virtual machine’s memory usage exceeds its specified memory limit. If this happens, the hypervisor will employ ballooning and, if necessary, swapping and memory compression to reclaim memory from the virtual machine until the virtual machine’s host memory usage falls back to its specified limit.
If you want an in depth explanation of ESXTOP and it’s counters, read this great article from Duncan Epping.
So let’s recap: Transparent page sharing is a default ESXi feature which deduplicates identical memory pages to reclaim physical memory and runs regardless of the amount of physical memory used. When a ESXi host faces a memory shortage, it has a few tricks up it’s sleeve to cope with this situation. First ESXi will requests virtual machines to free up virtual memory by using ballooning which in case results in reclaimed physical memory. If that does not work, ESXi defaults to memory compression. Memory compression moves memory pages to a separate cache which is located in the host’s main memory and compresses the memory pages. When all this does not have the desired effect, ESXi is left with one last resort, hypervisor swapping which moves unused memory pages to disk.
Although ESXi uses several innovative techniques to manage memory usage and reclaim memory, there are still VMware admins who think they know better and start disabling ballooning and compression without knowing why and what the effect is. True, a few years ago there was a best practice which stated that you should disable or uninstall the ballooning driver with eg. virtualized Citrix servers.But that is history know.
Based on the memory management concepts and performance test VMware has the following best practices for host and guest memory usage:
- Do not disable page sharing or the balloon driver. Page sharing is a lightweight technique which opportunistically reclaims redundant host memory with trivial performance impact. In the cases where hosts are heavily overcommitted, using ballooning is generally more efficient and safer than using hypervisor swapping, based on the results presented in “Ballooning vs. Host Swapping” on page 19. These two techniques are enabled by default and should not be disabled unless application testing shows that the benefits of doing so clearly outweigh the costs;
- Carefully specify memory limits and reservations. The virtual machine memory allocation target is subject to the virtual machine’s memory limit and reservation. If these two parameters are misconfigured, users may observe ballooning or swapping even when the host has plenty of free memory. For example, a virtual machine’s memory may be reclaimed when the specified limit is too small or when other virtual machines reserve too much host memory, even though they may only use a small portion of the reserved memory. If a performance-critical virtual machine needs a guaranteed memory allocation, the reservation needs to be specified carefully because it may impact other virtual machines;
- Host memory size should be larger than guest memory usage. For example, it is unwise to run a virtual machine with a 2GB working set size in a host with only 1GB of host memory. If this is the case, the hypervisor has to reclaim the virtual machine’s active memory through ballooning or hypervisor swapping, which will lead to potentially serious virtual machine performance degradation. Although it is difficult to tell whether the host memory is large enough to hold all of the virtual machines’ working sets, the bottom line is that the host memory should not be excessively overcommitted because this state makes the guests continuously page out guest physical memory;
- Use shares to adjust relative priorities when memory is overcommitted. If the host’s memory is overcommitted and the virtual machine’s allocated host memory is too small to achieve a reasonable performance, adjust the virtual machine’s shares to escalate the relative priority of the virtual machine so that the hypervisor will allocate more host memory for that virtual machine;
- Set an appropriate virtual machine memory size. The virtual machine memory size should be slightly larger than the average guest memory usage. The extra memory will accommodate workload spikes in the virtual machine. Note that the guest operating system only recognizes the specified virtual machine memory size. If the virtual machine memory size is too small, guest-level paging is inevitable, even though the host might have plenty of free memory. If the virtual machine memory size is set to a very large value, virtual machine performance will be fine, but more virtual machine memory means that more overhead memory needs to be reserved for the virtual machine.
VMware also released a great ‘VMware vSphere 5 Memory Management and Monitoring diagram‘ which provides a comprehensive look into the ESXi memory management mechanisms and reclamation methods. This diagram also provides the relevant monitoring components in vCenter Server and the troubleshooting tools like ESXTOP.