VAAI has been around for quite some time, but I still get a lot of questions on the subject. Most people seem to think VAAI is solely for speeding up processes, where in reality there should not be significant speeding up if your infrastructure has enough reserves. VAAI is meant to offload storage-related things so they are executed where they should: Inside the storage array.
EDIT: My title was
stolen borrowed from my dear collegue Bas Raayman in a post like this one, but focussing on file-side in My VAAI is Better Than Yours – The File-side of Things. Nice addition Bas!
My VAAI is better than yours
I recently had an interesting conversation with someone who had been testing VAAI on both EMC and “Brand-X”. This is an excerpt from the conversation:
|Him: “I have tested VAAI on Brand-X, and when I enabled VAAI my storage vMotion speed tripled over a non-VAAI storage vMotion. But when I run the same test on an EMC VNX, there is almost no difference in speed. This means that your VAAI implementation is bad…”|
|Me: “Let me ask you this: Let’s do a mind experiment, where you would lower the speed of your iSCSI infrastructure to 10Mbit, and then you repeat the test. What do you think would happen?”|
|Him: “That’s simple. Because VAAI will offload the actual copying of data to the array, the VAAI-enabled copy will run almost as fast. The non-VAAI copy will slow down to a crawl.”|
|Me: “I think that is a correct analysis. But consider this: You can push over a million IOPS through a vSphere server, wouldn’t that mean that our VAAI implementation is ok, but rather the regular performance of your Brand-X storage leaves something to be desired?”|
Him: “Ehmm… Yeah, you actually have a point there…”
The above clearly shows how people think about VAAI and how difficult it is to really understand the concept. That is why I setup a test that will clearly show using vCenter performance data what VAAI will do for you (and what it does not do).
For a change I will now show you the “block copy” VAAI primitive, but rather the “write same” primitive that is for example used when created eager-zero thick virtual disks. This vDisk will have its entire range zeroed out. Without VAAI the vSphere node in charge would fill a buffer with zeros, and send it out. Block after block will be zeroed.
Using the VAAI primitive “write same”, the vSphere node will no longer bother with creating a bunch of zeroes and sending them down to the array. Instead, it will send out T10 SCSI commands “write same()” that will tell the array to zero out blocks, without actually sending any data. Speedup? We’ll see. Less stress on the storage network? For sure.
We start off with a single vSphere 5 node connected to a VNX via iSCSI. A 26GB LUN is presented from the VNX up to the vSphere node. Next, we will be creating and destroying eager-zero thick disks on this LUN, with VAAI enabled and disabled. Hopefully we’ll spot the difference 🙂
To the Lab!
I decided to create both the VM and the vDisk on the LUN we will be measuring. First, we turn off VAAI in the “configuration” -> “Advanced software settings” of the vSphere node:
Then, we build ourselves a new VM and select an “eager zeroed thick” disk of 10GB:
To make the fight an honest one, I repeat this process (so I delete the VM from disk and recreate it) to make sure we have no impact from any caching on any layer. Basically the first run can “preheat” any caching, and the second run would be a more fair comparison to the tests that will follow.
Now we take a look the vSphere node’s performance graphs. I select the advanced view, and I look at Disk… Realtime and I select the counters “Write rate” and “Write requests”:
- Write rate – The bandwidth consumed by the data encapsulated in write commands to the disk in question;
- Average write requests per second – The average number of write commands issued by the vSphere node to the array.
I would expect that without VAAI, we’d see a number of write commands (that tell the array to write to certain blocks. Also, I expect to see a lot of bandwidth used for these writes as they contain boatloads of zeros. This is the initial graph I got out of vCenter after the non-VAAI deployment of the eager zeroed disk:
As you can see, there are indeed two consecutive actions where I create the very same VM with a 10GB eager-zeroed thick disk both without VAAI enabled. Funny to see, the underlying storage array somehow managed to optimize the used blocks, because when I reused the blocks, things got a little faster. More bandwidth was used, and as a result the creation of the vDisk went a little faster. Some kind of caching definitely going on!
Now for the VAAI test. After the VM has been created with its eager-zeroed thick disk,. I delete the VM from disk again. Next, I turn on VAAI by setting the variables from figure 1 both to “1” and we repeat the very same action of creating a VM with a 10GB eager-zeroed disk. Now we do indeed get a very different graph. When I put both non-VAAI and VAAI next to each other in the same graph, it looks like this:
As you can see, the difference is amazing. In the right side of the graph, not only is the write rate flat and near-zero, but the amount of write commands issued to the storage array is much lower as well. When we compare the runtimes, we see that the non-VAAI workload ran for a total of 22 minutes and the VAAI workload ran for just over 17 minutes.
Looking at the last graph, there is a stunning difference between Non-VAAI and VAAI-enabled workloads. The VAAI enabled workload executes just a little faster, but much more important: The vSphere nodes shoots only a fraction of write commands to the storage array, and the bandwidth required changes from “massive” to “almost zero” as soon as we enable VAAI.
This clearly indicates that VAAI is able to save a lot of resources, both in the vSphere nodes and the storage network. This will promote scaling of virtualization solutions for sure.