In a previous blogpost I covered the general issue of misalignment on a disk segment level. This is the most occurring and the most obvious misalignment, where several spindles in a RAID set perform random I/O and misalignment causes more spindles need to seek for a single I/O than would be required when properly aligned.
Next in the series there is another misalignment issue which is rare, but can have a much bigger impact on tuned storage: Full stripe misalignment.
Environments prone to full-stripe misalignment
The issue occurs whenever you have a RAID3, RAID5 or RAID6 set of disks tuned especially for a specific heavy write workload. Let’s assume you are able to tune the behavior of a very write-intensive workload. You may think that RAID10 is optimal for heavy-write workloads, but actually RAID5 can potentially do a better job here.
When for example talking about EMC storage underneath (for example CLARiiON or VNX), the segment size is 64KB. So for example on a RAID3 or RAID5 set which is constructed out of a (4+1) diskset, every stripe has 4*64KB = 256KB worth of data (plus a single parity segment).
If you can optimize your workload to always write 256KB blocks, each and every write to this RAID5 makes up exactly a full-stripe write. In this scenario there is very little write overhead, even less than having RAID10, even though RAID10 is considered to be most effective for heavy writes.
But for these tuned full-striped writes, a RAID5 set will need to seek all “n” spindles, but (n-1) of these spindles will carry data. This can be more effective than a RAID10 set (where only 50% of the number of spindles cary data). In this rare case RAID5 can outperform RAID10 for writes (at about the same number of disks). One needs to remember though, that this is a very specific workload (video streaming could be a good example here).
Misaligned full-stripe writes: What goes wrong?
So what goes wrong if I misalign this tuned workload? If you read my post on RAID types and throughput, you’ll probably see where I am going: As soon as the tuned writes are misaligned, all full-stripe writes all of a sudden turn into a full-stripe write, followed by a single segment write (because the data was not properly aligned on the stripe). The impact of this is generally devastating in RAID3, 5 or 6: In an aligned environment all disks in the RAID set seek to a particular track, after which they write 256KB of data plus parity in a single stroke. No parity reading of any kind required. After this write, the system is ready for another 256KB write (which can be on the next stripe (sequential writes) or any other stripe for that matter (random writes).
But when the data is misaligned to the stripe, there is a “leftover” after the full-stripe write: part of the data now overflows to the next stripe! This data will have to be written to the next segment on the next stripe. This means that the RAID5 set will have to execute another write on the disks, this time a read-modify-write in order to complete its write operation.
For this is be executed, the array will have to read the data currently present on that next segment, and read the parity of the stripe. Then recalculate the parity information and finally write out both the new block and the recalculated parity. This means two reads followed by two writes on two members of the RAID5 set.
As you probably can imagine, this will heavily impact the write performance for the following write.
The impact of a misaligned write on a RAID stripe can be very high. Don’t get me wrong, the issue described here is rare. Especially with VMware running, most I/O will be variable in size, and be of a random nature. But if you have this one application that is tuned to writing full-stripes to a RAID5 (or RAID3 or RAID6 for that matter), misalignment of that full-stripe write is an absolute performance-killer.