In part 1 of this series, I looked at two solutions for making virtual backups: Veeam and PHDvirtual. In this part, I’ll be looking at installing, making backups, verifying backups and of course restoring items.
Installation and initial configuration
The PHDvirtual installation between version 4 and 5 is pretty different, so I’ll be looking at them individually. Finally I’ll look at the installation of Veeam5.
Installation and configuration of PHD4
This version of PHDvirtual’s backup solution is more dated, and looks more like the older esXpress variants. PHDvirtual used to have a lot of complaints about the complexity of installing their software. Linux knowledge should be required. I never quite understood those points of view: You hire a consultant to install and configure, and after that the solution is almost completely autonomously. I have to make a living to!
The latest versions of PHD4 (like the latest version 4.0-5 which we look at here) in fact has a very straightforward windows installer. The Windows installer uploads an appliance to your vSphere environment. The funny thing is, that this appliance can either become the esXpressGUI or the PHDD dedup appliance. After uploading the single appliance you simply decide what the appliance should become. Some network configuration is required, but soon you should be able to login to the websites delivered by the esXpressGUI and PHDD and configure the solution completely from there.
The esXpressGUI basically takes care of two things:
- Pushing the code running in the Service Console to the ESX nodes (remember that PHD4 has no ESXi support);
- Holding the configuration for the ESX nodes and uploading them to the nodes.
Initial configuration is a lot of work. But that it the price you pay for a solution where virtually everything is configurable. After this initial configuration, you can walk away and customers really have a very automated system running which has proven to be virtually unbreakable.
Installation and configuration of PHD5
With the introduction of PHD5, setup has become even simpler: You import the appliance into VMware, perform basic networking configuration (it can even use the default DHCP if that is enabled in your network). Next you install a little application on the machine where you run your VI Client from, and all is complete. The menu in the VI client is extended by an option which allows you to perform all actions required. Nice!
PHD5′s extension to the VI client menu
The best part… It is an appliance. No single license required. It is the fastest and least troublesome installation of all solutions reviewed.
Installation and configuration of Veeam5
The installation of Veeam5 is very straightforward. That is, once you have Windows up and running. So yes, using Veeam will cost you an additional Windows license for every backup engine you install.
In my tests I used Windows 2003-R2 x64, because I like the smaller footprint of Windows 2003 when compared to Windows 2008.
Just run the setup file, start the program, add your vCenter, click through a wizard to configure your first backup job, and basically that is all required to get a backup running. Veeam has several additional installs you can do, like the Enterprise manager which gives you a webbased global view over all Veeam5 installations, and some item level restore plugins. Not too troublesome, basic next, next, finish installs.
I understand Veeam’s choice for the Windows filesystem. I think it all about getting your hands on people with programming skills on the platform, quick time to market etc. But it also introduces more points where things can go wrong. I haven’t found any killing problems, but some things you need to do extra which you don’t have when using an appliance (patching for instance). “Luckily” every single solution reviewed here had some issue at some point
PHD4 uses a different approach in scheduling than PHD5 and Veeam5. The last two have a simple wizard like setup from which you can create backup jobs. PHD4 has its scheduling inside the configuration parameters which you can set using the esXpressGUI tool. After that, you could use a string in the VM’s name to indicate it should be included in or excluded from the backup. Furthermore, you can force backups and other things by simply renaming a VM to have an additional string behind its name like “[xPHDD]” which will force a “backup now to PHDD target” on that particular VM. PHD4 takes care of renaming the VM back to its original name for you. Good stuff.
PHD5 and Veeam5 use backup jobs for automating backups, just like “regular” backup software. In Veeam5, you can run any job by rightclicking it and selecting “start”. In PHD5, you can use the plugin to right-click on a VM in the VI client, and selecting “backup from the plugins entry in the list. See one of my previous blogposts for some screenshots on that: PHD Virtual Backup 5.1er – First Impressions
Not going into too much detail – You can schedule backups in all solutions pretty much how 99.9% of people tend to schedule. There are no real real issues found in any of the solutions.
What I have seen going wrong: #Fail
Up next the juicy details: What has gone wrong in testing situations. Some not so pretty, others merely cosmetic. Basically I’ll just spew things I came across. Important to note that all issues could be fixed, but it is just for giving everyone an idea what can go wrong. And maybe you’ll find something useful in it to help you fix your problems. And no I do not want to open a support ticket!
First in general. PHD4 is super robust. It just does not break that easily. Not even in my environment where things tend to break down all the time… I have had big problems with PHD4 not being able to get CBT information from vSphere, resulting in a slow and long lasting non-CBT backup every once in a while. That is really frustrating! After I changed its configuration to run only a single VBA on each ESX node, the problem went away. It turned out to be a timeout in retrieving the CBT list from vSphere. Just a case of having too weak CPUs. Other than that, no real issues to be found in version 4.0-5.
PHD5 is a lot less stable in my opinion. Where PHD4 checks, rechecks and checks again it is somewhat slower between backups, but far more reliable. I already managed to get PHD5 to freeze up completely: By mistake I added an independent disk into the backup. As you may (or should ) know, independent disks are not included in a VMware snapshot. Where PHD4 would report an error and go about its business, PHD5 simply freezes forever. Manual recovery was required: Shutdown the PHDvirtual appliance, unmount remaining base disks from it, removing the snapshot from the VM that was backed up, restart the appliance.
Veeam5 had its share of issues too. My Veeam server was added to my domain. When logging in, I get a pile of network drives attached. Together with some test shares I made to CIFS targets, there were no drive letters left. As Veeam starts its backup, it hotadds the base disks from VMs to be backed up and gives them a drive letter. Or at least tried to ;). Result: Veeam freezes the job. Cancel job does not work. Removing a network share results in the drive letter being reused immediately by a hotadded disk. The job remains frozen though. Manual recovery was required. Shutdown Veeam VM, remove base disks from it that were still hotadded, remove snapshot from the VM that was being backed up, restart Veeam VM. Not pretty.
Also Veeam5 spits pages and pages of errors and warnings in the eventlog. It appears to be cosmetic: Errors like “The device ‘VMware Virtual disk SCSI Disk Device(…)’ disappeared from the system without first being prepared for removal.“. On the Veeam forum I read this is “normal behaviour”, but I hate event logs just filling up with errors that are “normal behavior”…
Veeam5 also adds Windows issues in the mix (that is why I don’t like complex full OSses under software like this). Backups ran fine, and all of a sudden I see “failed”. What happened? The retry failed as well, as did the next retry. The error I saw in Veeam was “Client error: Insufficient system resources exist to complete the requested service. Failed to read data from the file [F:Backup Production VMs2011-01-14T000139.vbk]. Server error: End of file“. Hate it when things like this happen. After some searching, I found this actually a Windows problem! If you’re interested: Windows starts using too much paged kernel memory. A solution was found at this Microsoft KB article: Backup program is unsuccessful when you back up a large system volume. When retrying the backup, CBT was somehow no longer usable; so it started a new backup on ALL non-zero blocks of data. Again. Sigh… Comforting thought that Veeam5 does a nice job when the blocks already exist on the target (the backup speed clipped on 45MB/s which is the maximum read speed of my regular storage cabinet). It will be really speedy again in the next run when CBT will be leveraged again in full effect (and not just null-blocks)
Could I go on? Sure thing. But the examples above merely show that each solution has its quircks. Basically, that is to be expected. These solutions are on the edge of technology, so issues will arise. As long as one is able to resolve the issues, it is something you should be able (or at least try ) to live with. I must add that support from both Veeam and PHDvirtual are very good indeed.
I heared a statement once on creating backups which I never forgot: “A backup is to be considered failed until you prove it can be restored successfully“. For years I used the esXpress / PHD4 feature called “mass-restore”. Basically it restores all new backups it finds on the backup target into standby VMs. So after my backups had run, I’d mass restore all my VMs to another vmfs location. Using this setup I had two added bonusses:
- Replicas of the most recent backups are always standing by on your storage, ready to be powered on;
- Each backup you made is verified by a restore.
This used to be “high tech”, but with the newer software releases both Veeam and PHDvirtual have been working hard on being able to verify their backups. Both take a very different approach to things.
While PHD4/5 performs constant block checking on the dedup target, ensuring all blocks will be there when you need to restore, Veeam5 takes testing a little further: They can actually revive VM’s directly from the dedup store (without performing a restore!), test for some basic functionality within these running VMs, then shut them again. Tests can be scheduled for single VMs (like a simple single webserver VM) or in groups of VMs (for example an Exchange server with Domain Controllers). Awesome!
So what is more important: Knowing all blocks are actually available (and ok) on the backup target (PHD), or knowing your VMs inside the backup target have been booted and tested successfully (Veeam)? SureRestore versus SureBackup.
Actually, I would like to have both in one single solution
Performing image level restores
The nice thing about VMware is that every virtual server is basically just a bunch of files. An image level restore is simple: Just put back all the VM files and you have a running VM again. This is the type of restore that has always been possible. There is no complexity involved with the operating system inside the VM: The entire virtual machine is simply restored on a block level.
All tested solutions can perform these restores. However, there are some notes to make for every one of them:
PHD4 image level restores
In PHD4 you can simply login to the esXpressGUI web interface, select your VM to restore, tell it where it should go, and the entire VM is restored using a VBA “backwards”. This means a VBA is started for the restore job, but instead of reading a disk and writing it out to the backup target it works the other way round: It reads blocks from the backup store, and injects them into an existing of new virtual disk.
PHD5 image level restores
In PHD5 image level restore is a little too basic to my likings. You can perform image level restores, but you have to restore to another location. The resulting VM will have another UUID (unique number that identifies a VM within vSphere), and another MAC address. Not always a problem, just too basic for my taste. As PHD5 is a very fresh product, I have no doubt they are working hard on improvements on this.
Veeam5 image level restores
When I saw Veeam5′s “instant recovery” in action for the first time, I was amazed. It is clear they have done “more than a little” work on restores… Alien Technology! (Although Veeam prefers it being “innovative technology”). Veeam5 is able to boot a VM directly from the Veeam dedup store. For this to work, Veeam adds an NFS share to vSphere, where the VM appears to reside (while in fact it is just a bunch of dedupped blocks inside the dedup store!). You can “simply” boot that VM. Changes on the image (VM writes) are stored outside of the dedup store.
After the VM is booted, you already have the VM up and running again (although it might have some performance issues when it comes to heavy disk I/O – After all it runs from a “rehydrated” disk from within the Veeam dedup store). The VM reads directly from the dedup store, while writes performed by the VM are temporarily stored elsewhere, much like having a snapshot to a VM (in fact it is closer to how a VM behaves during a storage VMotion).
After the VM has booted, you simply initiate a storage VMotion to wherever you need the VM restored (yes you need to have storage VMotion licensed on vSphere for this to work; otherwise you’d have to fallback to cloning or a cold migrate. EDIT: You can also use Veeam’s own replication feature to replicate the VM back if you do not have storage VMotion licensed. This is also without downtime). After the migration task completes, you’re done! The VM then starts to use the destination store, and returns to its normal disk I/O performance. Restore completed. Awesome. Unbeatable.
Performing File Level Restores (FLR)
Now that we have covered full image level restores, it is on to the FLR. This feature quickly became a necessity. Because let’s say a user deleted one single file from the Terabyte fileserver. Would you go and restore the entire file server, get out that single file, then delete the restore again? As you can see, FLR is a big plus here. So how does it work?
All tested backup solutions do block level backups. Not directly suited for file level restores. Of course, this does not mean it cannot be done! The idea is simple: Create an application that understands the file system inside these block level backups. This will enable that application to browse through and read spearate files directly form the blok level backups. And this is exactly how it works.
File level restore in PHD4
PHD4 must make use of the PHDD appliance (dedup store) in order to be able to perform file level restores. The older, non-dedup way of backing up which is supported in PHD4 cannot perform FLR. There is an option somewhere inside that can perform File Level Backup (FLB), but I have never seen a customer actually using it.
When using a PHDD target, FLR is possible on several file systems (NTFS, EXT2, EXT3 etc). You recover files by logging in the PHDD appliance, then browse through the backups, the file systems, the files. You mark the files you want restored, and you click on a button. Before you know it, the browser asks you where to save the zip file including your requested files. Simple and clean.
File level restore in PHD5
PHD5 uses iSCSI for file level restores. The only thing to do is to start the iSCSI service on your VI client machine and select the backup from which you want to restore. PHD5 will mount that backup directly via iSCSI to a drive on your machine, from which you can browse through your files, and simply “drag and drop” them out of the backup. Also simple and clean. Both selecting files and the restore throughput is faster on PHD5 over PHD4.
File level restore in Veeam5
Veeam 5 has more than one way to get files out of a VM: You can use the special file level restore wizards incorporated, you can extract separate files from the enterprise manager (via a browser), or you could use the “instant recovery” option to start a VM directly from the dedup store, connect to the booted VM and get the files from there. Plenty of options, and all work fine. On top of a simple file recovery, Veeam5 can also perform the recovery of domain, exchange and SQL items. When using the file recovery option, FLR works natively for NTFS. For other file systems (Linux etc), a small appliance is started in the background.
In this second part of my blog series I have been looking at backup, backup verification, image restore and file level restores. On the backup side of things, all contestants perform nicely. There are no minor shortcomings in any of the solutions.
Backup verification is also implemented in all solutions in some way. PHD relies on the way their dedup target works; it constantly checks the dedup store if all blocks are still there (and readable). This so called “SureRestore” option PHDvirtual provides is nice, but also a pure necessity. Because they rely on a single big dedup target, they NEED to make sure the dedup store remains intact. Failure of any given single block inside the store might corrupt many or even all backups. Veeam5 performs backup verification at a completely different level: They start VMs (or groups of VMs) directly from the backup target (without a restore), test the functionality of the VM(s), and shut them again. This technology is called SureBackup.
Backups speed are actually on par for all products as far I have been able to establish. Where PHD4 is slower inbetween backups (due to the “Coldadd” way of working), they rely on having multiple streams of backup data. There are always some streams backing up, saturating your backup target most of the time anyway. In the end, all perform extremely well regarding backup speeds.
Restores is where the real difference between the products becomes visible. While all products at least have restore capabilities, both PHD4 and PHD5 are no match for Veeam5′s Instant Recovery. PHD5′s full image level recovery is even cumbersome (although I know they are working hard on improvements here). Veeam’s idea to be able to boot VMs directly from the backup store, and then leverage Storage VMotion to restore them is brilliant. As an added bonus, Veeam5 leverages this technology also to enable automated backup verification, or even to allow the user to start backupped VMs in a virtual lab for testing!
Some pros and cons I found looking at all solutions:
- All solutions leverage CBT, which helps shortening backup windows enormously;
- The HotAdd feature from VMware’s vStorage API is used by both Veeam5 and PHD5. The older PHD4 solves its lack of HotAdd support by using what I like to call “ColdAdd”;
- PHD4 and PHD5 are virtual appliances, while Veeam5 requires Windows (and a Windows License);
- Veeam5 has far superior restore capabilities over both PHD4 and especially PHD5;
- Veeam5 can be bugged with Windows issues and requires maintenance, while PHD running under virtual appliances is more compact and resilient (especially PHD4 is very robust);
- PHD4 has no ESXi support, while PHD5 and Veeam5 both do have this support;
- PHD4/5 uses less CPU cycles than the Veeam5 solution. Not a real issue though considering the performance of modern CPUs;
- Veeam5 has extra “goodies” like a virtual lab, and item level restores;
- PHD4/5 lack VSS integration and rely solely on VMware Tools quiesce;
- Both solutions have different approaches to their dedup store implementations. Both have pros and cons;
- Veeam5 dedup implementation forces you to copy stores to disk or tape if you require long-term storage of backups, while PHD4/5 can retain backups for years within their store (“thinning the herd” helps to for example keep one backup per year, one per month etc).
My personal view on things
Last but not least… My personal take on these products. As you may know, I have always been a fan of PHDvirtual (and esXpress). Version 3 was hard to beat. But with the introduction of CBT and HotAdd by VMware, the world of virtual backups changed. Veeam quickly gained stability, speed and features. Right now all solutions “do the trick” when it comes to backing up virtual environments.
Looking specifically at restores Veeam has quickly become my favorite. Especially now that PHDvirtual had to completely rewrite their software for ESXi support, I feel restores (especially image level restores) on PHD5 need vast improvements. Looking at the upcoming part 3 of this review, replication totally lacks in the PHDvirtual 5.1 version of the software. I think PHDvirtual has a long way to go on the restore/replication side of things.
I consider the dedup approach PHDvirtual takes a very logical one. Maybe it is just me, but I like the idea in PHD4 of having one “side” making backups, the other “side” just taking care of the dedupped storage. It is almost like buying a backup product and a separate storage device. This approach results in one big dedup store, in which you can keep all backups, including the single “yearly” backups dated years back. All in one store.
Veeam requires that you make copies of their dedup stores to accomplish long term storage of these backups. That may sound as a weak point, but strong points can be weak points and vice versa: The single dedup store PHDvirtual has implemented can be considered a single point of failure for ALL backups of ALL times. But of course you could rsync that dedup store off site reducing those risks and so on and so on. I would urge everyone just to look at your backup and restore requirements, then see how you could realize that using any of the solutions presented, and finally choose the solution that makes most sense to you. If you cannot choose yourself, find someone with knowledge on both solutions. The right decision now will make you so much happier later!
Finally: I’m not a believer of companies claiming “We have this cool feature and they don’t (yet)”; I am more a believer of the long-term path a solution takes you. So when you decide to go either way, always consider where that road may lead you in the end instead of looking at the fancy things just up ahead.