For a long time I have been a fan of PHDvirtual (formerly esXpress) and their way of backing up virtual environments. Their lack of ESXi support has driven a lot of people towards other vendors, and the one that is really on technology’s edge nowadays is Veeam’s Backup and Replication. Now that PHDvirtual has released their version 5.1 with ESXi support, it is high time for a shootout between the two.
Some history on drawing virtual backups
In the old ESX 3.0 and ESX 3.5 days, there was hardly any integration with 3rd party backup products. Veeam used their FastSCP tool to draw the backups from a snapshotted virtual disk. But PHD had a totally different way of drawing these backups. They created and configured tiny Linux based appliances (called VBA’s – Virtual Backup Appliances). “Director” software in the Service Console would configure those VBA’s, then mount the base disk of a snapshotted VM to that VBA, and power on the VBA. The VBA had only a single goal in its short life: read the base disk and send it to a VMFS or network target. Absolutely brilliant. Without a doubt the fastest thing around: While Veeam was struggling to get a single stream of data out, esXpress powered up multiple VBA’s which could read and write VMFS or network targets at a normal VM’s speed. EsXpress was technically advanced, almost impossible to break (everything was checked and rechecked in the software), and after a somewhat extensive setup almost fully autonomous. The faster hardware you had, the faster esXpress would go. It scaled both up and out. Multiple VBA’s, multiple backup targets, failover to other backup targets, you name it.
But then things started to change. As VMware came to the conclusion that VCB really wasn’t such a great tool for backups, they started out on their successor of VCB. This solution really took off with the introduction of their own backup product: VDR (VMware Data Recovery). I do not have the inside information, but I would not be surprised if VMware has attempted to buy PHDvirtual just before those times. This is because VMware’s VDR is almost a ripoff of esXpress when you look at the way it works. But with the introduction of VDR, VMware also introduced a very good set of usable APIs to get backups really going: the vStorage API.
For backup, vSphere and the vStorage API brought two great features. The first was “Hotadd”. Hotadd allows adding and removing snapshotted base disks to and from a running VM. This makes it possible to have a single VM that performs backups of one or more virtual disks which can be added and removed as backup progresses. The solution esXpress had thought up years ago (I like to call that “Coldadd”) was surpassed in a flash; no more need to have this time consuming stopping, reconfiguring and restarting VBA’s.
The second great addition from vSphere was Changed Block Tracking or CBT. Technically, it was already possible to create “incremental” backups by reading all blocks of a VM and then decide using checksumming if a block had previously been backed up or changed since then. But with CBT this all became abundant.
Imagine that a backup product creates a snapshot and backs up a VM. The next time round, the backup software creates a snapshot again. But instead of just performing another backup, the backup software “asks” VMware via CBT for a list of all blocks that were changed between these two snapshots. No more reading of the full VM to determine which blocks were changed: the backup software only reads the blocks reported as changed! This had a tremendous impact on backup speeds; a 2TB fileserver could all of a sudden be backed up in under a minute as long as the change rate of the data on that server was low (and it usually is). Backup windows literally reduced from hours to minutes.
Since Veeam had always had worked from a VM that performs the backups, they shot ahead making use of all these new features. On top of this, ESXi made its entry into the market. Since esXpress relied on the Service Console on ESX for the “director” software for making backups, they had no way to support ESXi using their way of working. Action was required: It seemed that they were rapidly losing ground to competitors because their unique and smart way of working was now slowing them down. And most important: With ESXi being pushed, support HAD to be included.
Especially this ESXi support proved very difficult for PHDvirtual. In the end, they were forced to do an almost complete rewrite of their software. With the release of PHDvirtual Backup 5.1, that rewrite was a fact. A very fresh product, but under the pressure of time some features were left out. But will it still stand its ground against Veeam5, which has been playing on this type of playground for much longer? We’ll find out in the next sections.
PHDvirtual had a really unique way of making backups. However, it proved to be impossible to get it working on ESXi because of the lack of the service console. Now that VMware is pushing more and more towards ESXi, PHDvirtual had no choice but to rewrite their software to get it supporting ESXi.
Because of this total rewrite, a lot of features have not yet been implemented into their new 5.1 release. This is the reason I will also include version 4.0-5 in this review. If you use ESX and not ESXi, version 4.0-5 is still a very nice product to use.
The biggest competitor I think is Veeam. I have been testing their software from version 1.0, and frankly it used to be no match when compared to PHDvirtual (called esXpress back then). But lately, especially with the introduction of new abilities from within vSphere, Veeam’s software has taken a huge leap forward. Right now I think Veeam has become a real threat to PHDvirtual, not only on the marketing side of things, but also very much on technical grounds.
In order to keep this review a little readable, I’ll shorten the names of the products to PHD4, PHD5 and Veeam5.
What is tested
Since these products are backup products, I’ll focus my review on these items:
- Introducing the contestants;
- Performing backup;
- Verification of backups;
- Full restores;
- File Level Restores (FLR);
- Disaster Recovery (DR) abilities;
Item 1) is covered in this part of my blogpost. Items 2) through 5) will be covered in part two, and items 6) and 7) will be covered in part three.
In the old days focus was always on backup and backup alone (like VCB). At some point people figured out that being able to actually restore data was even more important (one of those “duuh!” moments if you ask me). As most vendors optimized their restore abilities and speeds, something new got pushed: Verification of backups. It is nice to have backups, its better to have verified ones. Both PHDvirtual and Veeam have been working hard on features like these, and it is interesting to see how both take a completely different view on things (SureBackup(Veeam) versus SureRestore(PHDvirtual) ).
File Level Restore (FLR) is also something that is considered a “must have” nowadays. Most of the older virtual backup software could perform full image level restores only, because the backup type was block-level. The backup products I look at today all have file level restore capabilities, although their backup mechanisms are still all at a block level. This implies that the restore software has to have knowledge on the file systems used within these block level backups (and they do!).
Finally in part three of this series of blogposts, I’ll look at disaster recovery and replication. How far does the software go in case you have to restore not one or two, but all of your VMs?
I will limit my testing to what Veeam calls the “virtual appliance” backup mode. This is because it is the fastest way to create backups without the need of dedicated hardware (like a SAN connected physical proxy server). All solutions tested in this review support this mode. As target storage I will look at deduped storage alone; all solutions support this (in different implementations) and I think it is the most popular way of storing your backups.
How backups are drawn from the virtual environment
All reviewed backup products (PHD4, PHD5 and Veeam5) are leveraging VMware-level snapshots for creating backups. Once a snapshot has been taken, the base disks will only be read from by vSphere, which means the backup software now has the ability to backup these base disks since they do not change anymore. From that point on the base disks are either ColdAdded (PHD4) or HotAdded (PHD5, Veeam5) to a VM performing the backup. When the backup is done, the base disks are detached from the backup VM, and the snapshot is removed from the VM that has been backed up. After that, it is simply repeat, repeat, repeat for all other VMs that have to be backed up.
One important thing to notice, is that PHD uses only the VMware Tools to quiesce disk access during snapshotting. Their point of view is that modern file systems can handle this “crash consistency” very well. I must say that in the many failures I have simulated on NTFS, I hardly ever ended up with a VM that failed to boot succesfully from a crash consistent state. One thing to note though is on larger database VMs. The crash consistent state a restored VM would be in could result in the database checking its consistency. Especially very large databases could pose a problem here; during consistency checks the database remains offline, which can mean “hidden downtime”.
Veeam5 has a very simple but effective way of using VSS for quiescing. Not just the disk access, but also the applications inside a VM themselves. This means a restored VM would always be consistent (and not just crash-consistent), which should result in this “hidden downtime” being 0.
Now that we have seen how to the backup software gets its hands on the data inside a VM, it is time to look at where the data is going. All reviewed software can use a dedup target for storing backup data. I won’t be going to deep into the idea of dedup; there is plenty to find on the web on what the basic idea is. The deduplication implementation for PHD4 and PHD5 are very much alike, while Veeam5’s implementation is significantly different:
The PHD4 dedup target
PHD4 uses VBA technology for creating backups. A separate virtual appliance (called the PHDD – PHD Dedup appliance) is basically just the dedup engine. In the most effective way of working, the PHDD appliance itself is only the dedup engine and not the storage target, and a regular NFS box acts as the storage target.
All VBAs (= the VMs performing the actual backups) read blocks from the source virtual disks. PHD4 also makes use of CBT, so this is pretty effective. Next, it calculates a checksum of every changed block, and sends these checksums (and the checksums alone!) to the PHDD appliance. This appliance determines if that particular block is already available on the NFS target using a small database full of checksums. If so, a link to that existing block is created. If not, the PHDD appliance asks the VBA to dump the block directly onto the NFS store. Technically this is a very scalable way of working: The VBA’s perform the checksum calculations (they are with many and generally on big ESX servers with plenty of processing power). The dedup target only performs lookups for blocks that might or might not exist on the NFS target, but does not handle the data stream itself. The NFS target gets the already deduped data streams directly from multiple VBA’s. So yes, teaming NICS to a big NFS store is really easy (multiple source IP/MAC addresses). A an added bonus, the bandwidth used from VBAs to target storage can be relatively small thanks to the “source side dedup”.
In the background the PHDD appliance constantly checks the blocks on the NFS store. The VBAs write each block in a folder structure, together with a small file containing the checksum of that block. By reading the block and comparing the checksum of that block with the stored checksum, the PHDD target constantly scans the dedup store for any faulty blocks. It can even detect missing ones.
When the PHDD target detects a faulty or missing block of data, it renders all backups that contain this block as “failed” (this could potentially be ALL backups!). But when performing a new backup, and a block that previously failed or was missing is detected within a VM again, this block is rewritten, and the previously failed backups become “ok” again. This is what PHDD calls “SureRestore”. Backups are verified constantly and fixed if possible, making sure you can actually restore them later on.
Deleting old backups from the dedup store is also very easy: find the blocks that are not used by any relevant backup, and simply delete the files that make up that block.
PHD4 has this vision of having one big dedup store. This is the most effective way to store your data when you look at the amount of backup space required (because you dedup across ALL data). But it also holds a risk: One big gigantic dedup store could mean that a single missing block in that store not only renders some backups useless, but renders ALL backups useless. The “SureRestore” option they provide should guard against that.
The PHD5 dedup target
In PHD5 there no longer is a separate dedup appliance like in PHD4. All functionality is now included in one big appliance, much like the Veeam5 approach. The way data is stored on a storage box is very similar to PHD4 as far as I’ve seen, but they are not compatible, so upgrading from PHD4 to PHD5 can be a problem, because you’ll need two dedup stores next to each other if you want to be able to restore any PHD4 backups.
The data streams (max. 4 in parallel) all run through this one appliance directly to a single dedup store behind it (VMware based store or networked storage target). This means that if you implement multiple backup appliances (for example if you have multiple VMware clusters), you will get one dedup store per appliance and no longer one big single dedup store like PHD4’s dedup approach.
The Veeam5 dedup target
Veeam5 takes a different road to storing its dedup data. It has two basic settings for storing deduped data. The first (and now default) is incremental with regular fulls. Basically it creates a full backup in one large file, then it adds another file holding incremental data every incremental backup. This option is good when you still rely on tape or you rsync the data offsite. You could even put it on an external USB drive easily!
The second mode in which you can store dedup data looks more like the way PHD4 and PHD5 store their data. This is the reversed incremental mode. In this mode the software stores all blocks of data in one big file, and it creates smaller control files for each backup performed (it uses these control files in order to be able to step back to previous states).
When using reversed incrementals, this one big file containing blocks would grow and grow forever. You cannot easily delete parts of it like PHD can (because they store each block in a separate file). But Veeam5 found a simple solution to this: It can mark blocks within this single big file as “overwriteable”, and new backups made will simply overwrite parts of this one big file. This means that space that became available when old backups were deleted will be reclaimed instead of added to the file. This will effectively keep the file from growing into eternity. Really different from PHD, but also really smart and just about as effective.
There is one thing that kept bugging me on the Veeam5 dedup way of working: Each job you schedule in Veeam gets its own dedup store. For example, if all backups run once a day but you want one or some VM(s) backed up more than once every day, you need a separate job for it. So it gets its own store. You want two backup streams in parallel? You need two jobs then, creating not one but two dedup stores. And so on. It is very easy to end up with more than just a few jobs.
Having these multiple dedup stores lowers the effectiveness of dedup, because any given block is likely to be stored in multiple dedup stores. In the end, the more backup jobs you have, the more disk space you’ll be wasting.
But is this space really wasted? When you do have multiple dedup stores, you also spread risks in case a dedup store might get corrupted. When looking at the cost of disk space, this might not be such a big problem after all.
Also important to note: Most of the time, the biggest gain in dedup is not across VMs, but in storing multiple backups of the same VM in a single dedup store (which still applies when you have mutliple dedup stores).
To tape or not to tape – That is the question
This is a title which immediately triggered Sir Henry (the guy on top of this blog with the VCP4 cap on). A long time ago he was a real star performing in Hamlet so he got all wound up now 🙂
Once you have these backups made to disk, then what? More and more often, you see people just “rsyncing” their backup files off-site. The oldskool way is of course to put your data to tape. So how do the reviewed backup solutions enable you to perform such actions? Both Veeam and PHD both have their own pros and cons to this, so I’ll just describe them and you figure out what you like best 🙂
PHDD4/5 and rsync
Using an rsync solution to sync your backups offsite is very easy with PHD. Because the backup target has separate files for each block, rsync can easily find these blocks and transfer them off site. On tip though for PHD4: I would find a way to replicate the PHDD dedup appliance offsite as well. I have seen that importing very large dedup sets to a newly created PHDD dedup target can take a very long time. To avoid this, simply replicate the PHDD appliance offsite as well to avoid having to import the entire store again in a disaster recovery situation.
Also do not forget that the PHD4/5 solutions create folders in which they store their blocks. A lot of folders. Typically, folders “00” through “ff” are created (that is hexadecimal for 0 through 255). Each of these folders contains another set of up 256 of these folder with the blocks stored under those. PHD4 pre-creates all these folders (which takes a while when setting up), while PHD5 seems to create them as needed. I believe that the first two bytes of each checksum determine in which folder the block will land. Anyway, rsync might have a hard job working its way through potentially 256×256 = 65536 folders. So make sure you have enough resources to perform the rsync.
Veeam5 and rsync
When you plan to use rsync to get your Veeam5 data off-site, you should preferably use the incremental backup mode. In this mode, a full or incremental file is added daily, that can simply be rsynced.
UPDATE: I was afraid that a disaster striking during rsync could destroy the target data file as well. But using rsync on a reversed incremental file would actually work. Also see How Rsync Works; a practical overview. After looking through this document, I found that rsync copies the file at the receivers end, and mixes in the changes from the sender, and finally compares checksums between source and destination files. Only on a match the original file at the destination is deleted, and the temp file renamed to be the new destination file. The only drawback is, that using rsync you’d need twice the space for storing the rsynced file: once for the original copy, once for the temp file being built during an rsync.
Using the incremental mode does not have these possible issues, because if you might not receive the latest incremental file you’d always have the previous ones untouched at the DR site, so in this mode you could use any kind of copying algorithm.
PHD4/5 and tape-out
Using tape to store your backups is a bit more difficult with PHD. You could backup to tape in two ways:
- Backup the entire dedup store to tape;
- Backup rehydrated versions of the VMs.
The first option is the most obvious: You backup the entire store to tape. You might even be able to perform incremental backups on that store, but that might consume a lot of time because there are so many small files involved (approx. 2 files for every unique MB of data!). Tip again for PHD4: I would backup the PHDD appliance to the same tape to avoid having to import the dedup store back into a fresh PHDD appliance.
The second option is a nice one of you have the time and space on tape: The PHDD appliance delivers a CIFS share, which contains an entire set of folders like “today”, “yesterday”, “by name” etc. Each folder contains a “rehydrated” set of VM backups (rehydrated basically means de-deduplicated). So yes, you actually see your vmx and vmdk files sitting inside the CIFS share! You can use any tape solution that can backup CIFS shares to get them out to tape. A pro is that your tapes contain raw VMDK files without any scary dedup applied to it. A con is the required space on tape (and time required to get it there).
Veeam5 and tape-out
Tape out on the Veeam 5 solution is just like the rsync solution: Use incremental backup mode and you’ll be fine. You can easily backup in incremental mode to tape as well; each day an incremental tape backup will backup all “new” files, so it will simply put the latest backup to tape. When you use reversed-incremental backup, you’ll end up with a full backup of all data each day (which you might not get done if your environment has a big storage size).
A few years ago I did not even need to create a post like this one. PHDvirtual (esXpress as it was called then) would win hands down. But nowadays things have changed. A lot of competition is around.
A really big difference between the Veeam and PHDvirtual solutions lies in the way dedup targets are implemented. Veeam5 creates multiple dedup stores if you have multiple jobs. This limits the effectiveness of dedup. On the other hand, it spreads your risks and possibly eases making backups offsite or to tape. You could use a storage device that also features dedup itself like Data Domain (it should dedup between the various Veeam5 dedup stores). Or simply get yourself some more disks of course 🙂
The PHD solution hangs on to a single (or in PHD5 a maybe a few) dedup stores, no matter how much jobs you create within. For disk space usage effectiveness this is hard to beat: PHD4 for example can put data from any VBA running in any environment to a single dedup engine, globally deduping (even across datacenters if needed!). Lurking in the dark of course is our infamous friend Single Point Of Failure (SPOF). One big dedup store makes for a bigger risk when it gets corrupted.
Each solution I looked at today has its pros and cons. As far as I am concerned, there is no real winner (not yet anyway). Solutions from both vendors stood their ground up till now as far as I’m concerned.
In the next parts I’ll be looking at performing the actual backups, and even more important, restores. In part three I’ll describe the abilities these solutions have for creating DR environments:
Veeam Backup vs PHDvirtual Backup part 2- Performing backup and restores
Veeam Backup vs PHDvirtual Backup part 3- Handling disaster recovery
A part four might be added on extra features any product might have that is worthwhile mentioning (not sure there will be any left though).