Why is my VM slow?

While there is no easy answer I've found that esxtop is one of the most useful starting points. esxtop is the VMware equivalent to the UNIX top tool. You can invoke it via:

(This post is part of a series on debugging virtual machines on ESX)

While there is no easy answer I've found that esxtop is one of the most useful starting points. esxtop is the VMware equivalent to the UNIX top tool. You can invoke it via:

~ # esxtop

If after running the command if you only see a stream of indecipherable text then hit control-c and make sure to set your term correctly:

export TERM=xterm ~ # esxtop

Things should now look better.

This tool contains a lot of information, and VMware has published several documents and knowledge base articles describing how to use it (unfortunately many of them our now out of date).

I'm not going to provide a full description of all fields, but rather highlight a few interesting views and statistics. For reference you can always press "h" to get to the help screen.

esxtop can be used to gather information on CPU, Memory, Networking, and Storage. When the program is started it show CPU details by default:

vm screenshot 1

or people that have used top before this should look somewhat familiar. The big difference is that _esxtop_provides additional stats so that you can understand utilization of the VM as compared to utilization of the underlying physical host. Using "U" and "R" you can sort the entries by USED (roughly percentage physical CPU time accounted to the world) and RDY (percentage of time that the world was ready to run but not actually on CPU).

Things to look for:

A high USED time indicates that the guest is using a lot of CPU. If the used time is close to 100% it may indicate the VM is CPU bound.

A high RDY time can be an indicator that a VM isn't getting sufficient CPU resources, further investigation of the MLMTD statistic can differentiate between a CPU reservation that is artificially limiting the resources and a system that is overloaded.

A high SWPWT time indicates that the VM is having to spend long periods of time having its memory swapped. This can indicate that the memory reservation is too low or the host is over subscribed.

When looking at a specific VM it can be helpful to limit the output to just the worlds of the VM. This can be done using "l" and "e":

vm screenshot 2

is view shows only the worlds associated with the VM humpty. It's a useful way to examine the WAIT vs VMWAIT percentages. VMWAIT only applies to the vcpu worlds of a VM. The big difference between WAIT and VMWAIT is that VMWAIT does not include IDLE time. This provides an indication of the percentage of time that the VM is blocked
waiting for the hyper-visor to do work.

Memory details:

To see memory details press "m":

vm screenshot 3

The memory view of esxtop allows you to see the total amount of physical memory allocated to each virtual machine. MEMSZ shows the amount of memory the VM has configured, while GRANT shows amount of physical memory currently mapped to the VM.

A few things to look for:

SWCUR shows the total amount of memory currently swapped out for the VM, and SWTGT shows the target amount of memory the hyper-visor is trying to swap. High values here indicate that the VM is swapping, which can lead to degraded performance.

SWR/s, SWW/s LLSWR/s, LLSWW/s will give a breakdown of reads and writes of memory to disk and SSD respectively.

Network details:

To see network details press "n":

vm screenshot 4

This screen is useful for determining the network throughput for virtual switches, VMkernel NICs, and VMs. It is a useful tool for determining if the networking infrastructure is overloaded. A few things to look for:

A high DRPTX indicates packets are being dropped on transmit. Packets may be dropped on transmit due to congestion, queue depth, etc.

A high DRPRX indicates packets being dropped on receive. This may indicate a problem where the guest doesn't have enough CPU to process the incoming networking traffic, the ring for the virtual adapter is too small, or the VMkernel NIC is over subscribed.

Disk device details:

To see disk device details press "u":

vm screenshot 5

This view shows the utilization of the physical host's disk devices and NFS shares. The following statistics are useful:

QUED/USED/LOAD can show if a device is overloaded.

DAVG/cmd KAVG/cmd GAVG/cmd QAVG/cmd are useful statistics to see average latency at various levels (DAVG == device, KAVG == as viewed by the ESX kernel, GAVG == sum of DAVG + KAVG, QAVG == time spent in queues in the storage stack). This can help to indiate if there is a bottleneck at a particular layer.

Storage adapter details:

To see storage adapter details press "d":

vm screenshot 6

This view is very similar to the device view. It can be useful to determine if there is a bottleneck on an adapter as opposed to a specific device.

Virtual disk details:

To see virtual disk details press "v":

vm screenshot 7

This view shows the performance of the virtual disks of a VM. You can unroll a specific VM using "e" to see each disk individually, which allows you to see the read/write latency per virtual disk. Comparing the latency of the virtual disk to the physical device can help narrow down bottlenecks.

Each esxtop screen has a number of options that allow you to further configure the data presented. I'd encourage you to poke around and see what you can find!