Posts Tagged ‘esxtop’

esxtop advanced features

No rocket science here. esxtop has always been there. Yet a lot of people miss out on some of its great features. Hopefully this blogpost will get you interested in looking at esxtop (again?) in detail!

Yesterday I attended a very interesting breakout session about esxtop and its advanced features in vSphere. Old news you might say, but there is SO much you can do with esxtop. For example, you can export data from esxtop and import them in Windows perfmon. And if you did know that, then for example, did you know you can now actually see which physic NIC is being used by a certain VM?

Other neat little features were shown. The best one being that the “swcur” field is actually NOT about the current swapping activity of a VM, but swapping that occured in the past (yes, I too would have called it differently…). How many of you knew that one? Finally, a very interesting field in the storage screen (yes for those who did not know that one, esxtop is not just about CPU, but also memory, storage, and new in vSphere… Interrupts) ). This field is called “DAVG” and this shows the actual latency seen by ESX to your storage (and also KAVG for kernel latency and GAVG for the total latency the guest sees).

There were also a few examples of misbehaving VMs which was very interesting to see. Numbers which seemed not possible, yet explained perfectly. I would like to vote this very last presentation at VMworld 2009 the best technical presentation I witnessed there!

I hope I got you (re)interested in esxtop. I am more of a graphical guy, so I like the performance monitor embedded within the VI client. But some things just aren’t there. So esxtop is definitely worth a(nother) look. If you’re using ESXi, make sure to download the vMA appliance (here) which has resxtop included (which looks a lot like esxtop on ESX).

Performance tab "Numbers" demystified – by Erik Zandboer

 “CPU ready time? Always use esxtop! The performance tab stinks!” is what I hear all the time. But in reality, they don’t stink, they’re just misunderstood. This edition of my blog will try to clarify this using the famous CPU ready times as an example.

A lot of people have questions about items like CPU ready times. Advice is usually to run esxtop. I have always been a fan of the performance tabs instead. However, the numbers presented there are very often unclear to people. The same goes for disk IOps. In fact any measured value with a “number” as the unit appears to suffer from this. No need – The numbers are valid, you just should know how to interpret them. Added bonus is off course, you get an insight through time on your precious CPU ready times, instead of a quick look in esxtop. If you tune the VirtualCenter settings right, you can even see CPU ready times on your VM yesterday or the day before!


Comparing esxtop to performance in VI client

When you compare the values in both esxtop and the performance monitor, you could notice a clear difference in these values. While a VM has a %ready in esxtop of about 1.0%, in the performance graph (real-time view) it is all of a sudden a number around 200 ms?!? As strange as this sounds – the number is correct. The secret lies in the percentage versus the number of milliseconds. It is the same thing, but shown differently.


The magic word: sampletime

When you start to think of it: The number presented in the performance tab is in the unit of milliseconds. So basically it is the number of milliseconds the CPU has been “ready” as opposed to the percentage ready in esxtop. But how many milliseconds out of how much time you might wonder? That is where the magic word comes in – sampletime!

Sampletime is basically the time between two samples. As you might know, the sampletime in the real-time view of the performance tab is 20 seconds (check upper left of the window):

Sampletime - yes! it is 20 seconds

Sampletime - yes! it is 20 seconds

So VMware could have taken the percentage ready from esxtop, and display these instant values to form this graph. In reality however it is even nicer: VMware is able to measure the number of milliseconds ready-time of the CPU inbetween these two samples! Once you grasp this, it all becomes clear – So if you see in real-time view a value of 200 ms, in esxtop-like representation you would have 200/20 = 10 milliseconds per second, or 0.01 seconds per second, which is dead-on 1% (how DO I manage 😉 )


Changing the statistics level in VirtualCenter

So now we have proven that the numbers can actually be matched – time for the nice bonus. If you login to VirtualCenter via the VI client, you can edit the settings of VirtualCenter by clicking on the “administration” drop-down menu, and selecting “VirtualCenter Management Server Configuration” (whoever came up with that monstrous name!). In this screen you’ll see an item called “statistics”. There you can tune a see more statistics over more time:

Altering statistics levels in VirtualCenter

Altering statistics levels in VirtualCenter

In this example I have increased the level of statistics, and I have also changed the first interval duration to two minutes. In this setup, I can see CPU ready times of all my VMs for one entire week now:

CPU ready times for an entire week

CPU ready times for an entire week

Take care, your database will grow much larger, and VMware does not encourage you to keep this level of statistics on for an extended period of time – although I have been running these levels for months now without issues on a small (2 ESX server) environment. My database size is now around 1.4GB – Not alarming although the size grows exponentially with the number of ESX hosts you have.

When you look close into the graph (an almost idle webserver which runs virusscan at 5AM), you’ll notice that CPU ready times boost up to around 50.000 milliseconds. Sounds alarming, right? But do not forget, in this case I am looking at a weekly graph, which is sampling at a 30 minute sample rate. So I should do my calculation once again: 30 minutes = 30*60 = 1800 seconds (=sampletime). So 50000 / 1800 = 27.8 milliseconds per seconds, or 0.0278 seconds per second. So CPU ready is peaking up to just below 3%, which is acceptable (usually below 10% is considered ok).  Not bad at all! But it would have been hard to find using esxtop though (I hate getting up early).


Any number

The calculations I have done are valid for all measured values that have a “numbers” unit within the Performance tab. So it also works for disk operations (like “Disk read requests”) and networking (like “Network packets received”).


So if you looking for esxtop output through time in a graph – Take a second look at your Performance tabs!

Soon to come