Anyone who manages a storage area network has probably already learned (perhaps the hard way) that performance and protection are almost mutually exclusive concepts, particularly in older storage arrays. But choosing performance over protection always comes with great risk. On the surface, many storage arrays offer great features -- like deduplication, which preserves disk space by eliminating duplicate data. Likewise, almost all support some implementation of snapshots, which captures the state of your system at set intervals throughout the day. It all sounds great in principle, but both of these technologies often disappoint in real-world situations.
Dedupe Can Be a Clock Killer
Data Deduplication (i.e. Dedupe) involves the elimination of duplicate data on the storage center. For instance, if everyone in your company receives an email with the same file attached, only a single version of the file would be saved to the [deduped] storage, and index pointers would be used to connect everyone’s email message to that one occurrence. Obviously, this can save a lot of the free storage capacity, which is why dedupe is so desirable.
There are different iterations and implementations of deduplication, called “file-level” and “block-level” deduplication. As their names imply, file-level dedupe eliminates duplicate files, while block-level eliminates duplicate blocks. Of the two, block-level is the most efficient because a file can be divided into many blocks, allowing deduplication to function with a finer granularity. Another distinction among deduplication methods involves when the actual process occurs. There is “inline deduplication” which analyzes data enroute to the storage, so duplicate data never even gets written to disk. There is also “post-processing deduplication”, which performs dedupe periodically on a fixed schedule, but only after all data is initially written to disk.
Inline Dedupe vs Post-Processing
Inline deduplication is most desirable because it consumes less storage capacity, but it also taxes the heck out of the CPU, often driving latency up to an intolerable level – even hundreds of milliseconds! The impact on the network is so severe that most storage vendors recommend disabling the feature on primary storage in order to maintain peak performance.
Post-processing dedupe isn’t much better. Let’s say you scheduled dedupe to run at the top of every hour. Like clockwork, the network will experience huge latency increases starting at the top of every hour, lasting “as long as it takes” to clean up the storage. This will not sit well with your employees or your customers!
The final word on deduplication seems to be, “It’s a great space-saving feature… just don’t use it during business hours.”
The Storage Snapshot Snafu
Storage snapshot technology is nearly three decades old and has long been the de facto method used to protect storage against data corruption or other disasters. The idea is fairly ingenious. Rather than running a full backup (which takes a very long time), only take periodic “snapshots” that contain the changes made to storage since the previous snapshot. Storage vendors like to tout that a snapshot interval of 15-minutes will ensure that you never lose more productivity than that in the event of a catastrophic data-loss event occurs.
Again, that sounds really good in principle. But in practice, when you enable snapshots at the 15-minute interval, you soon realize that after a month, you have nearly 3,000 snapshots that are quickly consuming your free storage! Snapshots eat up space, and unless you want to commit 25-40% of your total storage capacity to hold them, you find yourself where many storage administrators often do: trading a larger recovery point objective for more free storage.
For instance, dialing back snapshot frequency to 1-2 times per day will create fewer snapshots and consume less free space. But now you are at risk to lose as much as a full day’s productivity should it become necessary to restore from those snapshots. It really doesn’t make sense to dial back your data protection during the most productive hours of the day, but sadly, it often becomes necessary.
The final word on snapshots is, “Great feature, but you probably don’t have enough free storage to use it to its full potential.”
What’s a Storage Admin to Do?
Recent innovations have found ways to solve the shortcomings of both deduplication and snapshot storage. In the case of dedupe, the solution came from the introduction of powerful multi-core CPUs and new software designed to take full advantage of them. This allows modern storage arrays to perform inline deduplication (and even compression!) on the fly, so redundant data never makes it to disk, and the data that does is already space-optimized. That combination keeps free storage space at a maximum without taking clock cycles away from I/O processes.
If you are responsible for managing a storage system and find you are constantly fighting the battle for free space, the advantages of going with Cloud storage starts to stand in more relief. Your Cloud provider needs to offer more types of storage workloads than you’d want to buy on your own. To learn more about the latest innovations in modern storage and what it can mean to your business, contact the storage professionals at CNS Partners.
Learn about managing storage array risk and ensure your IT system is robust and developed to perform well under any adverse condition by reading our eBook titled "Seven Key Factors for IT System Success."
For an in depth about how to diagnose and solve network performance issues, download our eBook.