I have seen different implementations of read caching in arrays and even inside hosts, just to be able to cope with boot storms of VDI workloads. When using linked clones caching really helps; all the VDIs being booted perform massive reads from a very small portion of the infrastructure: the replica(s). VMware came up with a nice software solution to this: Why not sacrifice some memory inside the vSphere nodes to accommodate read caching there? This is what CBRC (vSphere 5) or Host Caching (View 5.1) is all about. And… It really works!
What happens during a boot storm
First of all, we need to figure out what happens during a boot storm. Even wondered just how much data a Windows desktop reads as it boots? the answers is: Around 0,8 and 1,5 GB. With VMware View in Linked Clone mode things get even better: Most of this data is common for all VDIs within a pool, and will all be reading from the VERY SAME replica disk! It is a very clean and natural form of deduplication (just go figure – all common data sits in the replica, all unique data sits in the linked clones!)
Storage vendors have been using this “feature” with great success: If I look at NetApp, they have been positioning their FlashCache as an excellent solution. EMC has been promoting their FAST-cache with great success. What about 1.000 light VDI users on 15x 15K 300GB spindles with just two FAST-cache SSD drives 100GB each? Wow.
VMware’s response to read caching: CBRC
If you look inside the vSphere 5.0 advanced settings, you can find this “CBRC” (or Content Based Read Caching) there:
So what is this? As far as I know, it was introduced in vSphere 5.0 and was meant to be accompanied with VMware View 5.0. However, it got pulled form the GA version of View 5.0. Now in View 5.1, it is back (and ready to kick b*tt!). So what does it do?
The idea was really simple: If a VDI image only reads 0,8 – 1,5 GB of footprint from the storage during boot, and this footprint is shared (“dedupe like”) between a lot of VDI’s, then why not cache this relatively small amount of data inside vSphere nodes? And that is exactly what CBRC establishes: It eats between 100MB and 2GB of memory from each vSphere node, and it uses this memory to cache the most common reads from a certain virtual disk (View will configure the replica disks for this obviously).
In View 5.1 is is really easy to configure:
After enabling this feature, vSphere will start to fill the cache, and update the cache only when it is scheduled to do so:
Cache regeneration?!?! – Yes. This is not an ordinary cache as you know them from CPUs or DRAM cache in storage arrays…
How the CBR-cache works inside
The Content Based Read Cache (CBRC) works very different from most of the standard caches you know (like storage arrays DRAM caches). For starters, this is a read-only cache. Read caching normally caches the reads that are being done in some smart manner, and flush them out again when the blocks aren’t used for a longer period of time. The length the data remains in cache depends on the cache size and the amount of new data coming in. Writes are generally cached in the read caches as well, just to make sure the cache remains up to date.
CBRC is different in several ways. First off, the cache is populated outside the “blackout times” which you can configure. So the cache is semi-static in this respect: During working hours the cache contents is fixed. Writes to blocks that are cached do not update the cache, but they are simply invalidated in cache, effectively expelling those blocks from cache. I am not sure why VMware choose this setup, but I figure it has least overhead and is simplest to build. Plus… It is not really necessary to cope with writes to the cached blocks: As the CBRC caches replica disks, the cache would in normal circumstances not change anyway as they are read-only to the desktops!
When looking at the results VMware showed at VMworld 2012 in San Francisco this year, the effect of CBRC is really significant:
Wow. Peak IOPS from 16.000 to 2.000… This really shows how effective even a small amount of caching can be if applied right!
Some things to consider
When I first saw this feature, it immediately got my attention. How cool is this, and it eats only 1-2GB of memory! My initial thought was: “why limit at 2GB”? I know a lot of people that would LOVE to have a GENERIC read-cache i vSphere, even if it ate 32GB or even more. The reason VMware gives, is that above 2GB there was little gain. I can understand this, as this type of caching is very limited in it use case. It appears to have specifically been tuned to VMware View linked clone use cases. It is not a generic read cache (yet?).
Still, I wonder what happens in a large View environment, where I have 10 replica’s from 10 pools, and all pools use the same vSphere nodes. Whenever I boot all desktops on a single node, chances are that these desktops will need to use TEN replica’s, and not ONE. So the CBRC would then have to cache TEN replica disks instead of ONE. As CBRC does not dedupe the blocks (as far as I have been able to determine), so in this case 2GB will probably not help that much. I would love to see CBRC being able to use x GB per replica, and Y GB in total just to max it to some limit. If only I had the hardware in my homelab to spin up 100 VMs on a node from 10 pools to test this…
The WRITE side of things
This CBRC feature is very cool. But remember, it is a READ cache. Will it help during bootstorms? Definitely. Will it help in login storms? Probably. Will it help in steady-state workload? Hardly. As we all know by now, VDI is a very write-intensive workload (see Sizing VDI: Steady-state workload or Monday Morning Login Storm?). A very cool tool to look at the R/W ratios and much more is vscsiStats (see vscsiStats into the third dimension: Surface charts!)
CBRC will not help at all with this write caching issue… unfortunately. Different vendors have been taking a different angle on this problem trying to solve the issue of heavy writes pounding on storage arrays:
- Fusion IO promotes their in-server flash cards. They have a read caching mode (which is almost useless in the VDI use case as CBRC already solves the read caching problem to great extends). They will generally promote their cards to be used as local storage inside the vSphere servers. Yes that will quickly absorb the writes, but introduces all kinds of other pains. And my take on this? For the VDI use case: If you are using the cards as local disk anyway, then why not just put in some SSDs in the servers behind a RAID controller? That would make more sense to me, as performance is good enough in general, and it is way cheaper too;
- NetApp positions FlashCache for the reads, which becomes less of a requirement when CBRC is used. For writes, the FlashCache cards do not help, so they rely on their WAFL file system to cope with the writes. Cool technology (I know the gory details of ZFS for VDI workloads, and to my understanding WAFL is very much like this). On the downside: You can “sequentialize” all you want, but at some point in time you still need to hit the spinning disks…;
- EMC positions FAST-cache for this. The power of FAST-cache is that is a cache that looks a bit like CBRC, in the sense that FAST-cache as well as CBRC does not have the urge to flush stuff out. The cool thing about FAST-cache is, that is also absorbs WRITES. So the heavy write workloads that VDI deliver do not have to go to spinning disks AT ALL in the EMC FAST-cache case;
- All-flash arrays like Whiptail. These array vendors have kicked out disk altogether, and build flash-only arrays. As this may not be the solution for today (as flash is very cheap per IO but very expensive per GB), but as prices of SSDs keep coming down these all-flash arrays become more and more a realistic solutions. The fact that EMC has recently acquired XtremeIO is a clear indication of all-flash arrays becoming viable solutions.
The future of storage in VMware
VMware is taking the first serious steps in virtualizing storage. Things like CBRC/Host caching, HBR (Host based replication) and the VMware VSA (Virtual Storage Appliance) are examples of this. So is there more to come? According to VMware, definitely! To name a few:
- vVols. How cool is that?? The word is finally out: We will be no longer provisioning LUNs or exports/shares… Each VM will get its own share of storage using vVols. You could see it as a virtual disk delivered directly from the storage array;
- vSAN. Just announced during VMworld 2012, vSAN will deliver a distributed storage tier very much like a distributed vSwitch: Local disk will take part in this distributed pool of storage and delivered out as shared storage! It looks something like the VSA VMware has right now, but this will be included right into the vSphere core and hopefully not be dependent on vCenter;
- vFlash. Not the official name (it was shown as “Flash infrastructure”) where local flash (either cards like Fusion IO / EMC’s VFcache or local SSDs) is delivered directly to VMs as a read/write cache! Yes you heard right, WRITE cache as well. Not sure how this will work out, maybe it will be separate from the vSAN technology, maybe it will be included there.
For a more detailed overview of these new features, look at VMworld 2012 Storage Nerdvana: vVols, vSAN and vFlash.
Interesting times to come!!