Inside Else Inside TEMPlate====>
 

RAID Remains Relevant, Really!

By Greg Schulz

While some RAID systems may be outdated, RAID technology continues to be a key element in enterprise data storage.

RAID -- redundant array of independent disks -- as a technology for protecting data remains alive and relevant even after 25 years since the original Berkeley white paper appeared. Granted, some RAID implementations and the systems that they are apart of -- along with how they’re configured -- may be more dated and limited vs. others.

Like the Hard Disk Drive (HDD) that RAID (Redundant Array of Inexpensive (or Independent) Disks) is most commonly associated and used with, both have been declared dead for years if not decades, although both remain very much alive.

What this means is that some vendors’ hardware- or software-based RAID solutions continue to evolve with new functionality, capabilities, ability to scale performance (IOPS, bandwidth, latency), availability, capacity and effectiveness, while some others remain static. This also means that those using or making storage decisions have options to use RAID in new ways vs. how they have in the past, assuming their particular chosen technology solution is flexible enough to do so.

RAID revisited

Let’s take a quick step back and revisit RAID fundamentals so that we can step forward to see where it is today and will be tomorrow. The original premise in the December 1987 University of California Berkley white paper titled “A Case for Redundant Arrays of Inexpensive Disks (RAID)“ (PDF version here) was to overcome limits and barriers of what were called Single Large Expensive Disk (SLED). At that time, magnetic HDD were physically large, prone to failures, limited space capacity, with performance bottlenecks, not to mention having propriety access methods and protocols. Besides the IBM Mainframe model 3380 HDD, the industry standard HDD that was OEMd by a variety of different vendors was the Fujitsu Eagle.

The Fujitsu Eagle was (depending on model) a 470MByte (Raw unformatted) 6U (10.5”) device with multiple 10.5” diameter platters that spun at under 4,000 RPMs, consuming over 500 watts of power and a then bargain price tag of around $10,000. Put this into perspective of today’s enterprise class high performance 2.5” 15K 600GByte HDDs that consumed around 8 watts while in use along with much more affordable price tags (depending on where or who you buy them from).

Keep in mind this was in an era when the SCSI (parallel) HDD was just emerging and things like ATA, PATA, SATA, Fibre Channel, iSCSI, SAS as interfaces were at best a futuristic pipe dream, not to mention a 1GByte HDD still being out over the future horizon. This was also the era just a few years before a mid-tier VAX/VMS 128MByte Solid State Device (SSD) using DRAM cost about $100,000 USD ($178,941.85 today if adjusted for inflation).

Let us get back to SLED and emerging SCSI or smaller. The current 2.5” and 3.5” HDDs (and SSDs) are decedents from their predecessor 5.25” drives, which back in 1987 were just emerging. As mentioned, a common theme was something similar to today of I/O performance not keeping up with space capacity. Thus, the Berkeley white paper presented the initial five RAID levels, which over time have expanded and evolved, not to mention being enhanced by various vendor implementations.

RAID 0

RAID O stripes data across all drives for read and write performance while increasing space capacity vs. that of a single drive. There is no data protection with RAID 0 which also means that there is no space capacity

overhead, loss of a drive results in the entire RAID 0 set being unusable. This means loss of a single HDD results in the entire RAID group being impacted. This is also sometimes referred to as JBOD or Just a Bunch of Disks mode by some vendors particular if using only one drive per RAID group.

RAID 1

RAID I mirrors or replicates two or more drives for protection and possible read performance depending on the implementation. Some implementations enable multiple concurrent reads to occur from different drives, as well as having three or more mirrors.

Write performance, assuming equal type of drives, should be about the same as writing to a single JBOD. However, different implementations will vary and some systems with write back cache or other optimizations may be even faster.

There is a catch, which is that RAID 1 has a space capacity overhead for protection of one to n where n is the number of copies. The benefit is that if a drive fails or is removed, the entire remaining drive is intact, however it is effectively running in JBOD (not protected mode). An option is to setup a triple mirror such that if one drive fails, there are two surviving drives. And when a spare is added or a failed drive replaced, a copy vs. rebuild can occur.

Yet another variation is to use a quad or four drive mirror, or a two drive mirror in conjunction with remote mirroring, that is, replication to another storage system local on or off-site. As a result, RAID 1 is a very popular option used today for both HDD and SSD where a balance of performance and availability are needed.

RAID 2

Hamming code for error correction up until recently has been the least known and adopted RAID level given its complexity and compute cost. The reason is that this approach and its variations -- including erasure or forward error correction -- use more advanced algorithms to create multiple parities that can be used to reduce space overhead, yet require more compute power.

Variations of RAID 2 and extended parity protection are finding new opportunities with improvements of compute processing capability and ability to leverage larger number of drives with, for example, erasure codes, forward error correction and other algorithmic protection.

RAID 3

Stripe with dedicated parity found success in the mid 90’s with solutions such as those from Baydell that provide good sequential reads and writes that were well suited for video and similar applications. But more recently as in the past decade and a half, use of RAID 3 has dramatically decreased with the continued maturing of RAID 4, RAID 5, RAID 6 and other I/O optimization techniques.

RAID 4

Stripe with dedicated parity and independent reads and writes which, in concept, is similar to RAID 3. The difference is that of multiple concurrent I/O operations.

With RAID 3, all disks work in parallel to handle I/O operations where with RAID 4 (and higher) multiple I/O operations could occur. While RAID 4 (and higher) supports full stripe reads and writes, this also means that multiple smaller reads (or writes) can occur. This is also where some confusion and myths come into play based on how different vendors implement their RAID software or hardware solutions.

For example, those who do not do good write buffering or read-ahead and cache management may encounter additional write overhead. On the other hand, those who can do good cache management and attempt to do full stripe writes (or reads) when possible while maintaining data consistency can get better performance.

Thus, look at different solutions and ask vendors how they implement their RAID 4 (or RAID 5 or RAID 6). Do they do write gathering? How are full and partial stripe writes (and reads) handled? What about options for setting chunk (shard) size?

Simply going by the RAID definition may be safe for I/O planning of worse case scenarios or lowest common denominator. But it can also lead to false assertions, hence do your homework.

RAID 5

Stripe data with rotating parity builds off of RAID 4, however it eliminates the dedicated parity disk. Instead, all members of a RAID set take turns storing the parity in a rotating manner so that no one disk becomes bottlenecked.

This means that each different rank or stripe of data will have a different drive handling the parity. The good news is that space capacity overhead can be greatly reduced vs. mirroring (RAID 1) with, for example, a four drive RAID 5 group having a 25% space overhead for parity (3 + 1) with three data and one equivalent parity drive.

However, this also leads to a common RAID 5 myth: that it always has a 25% space overhead, which is only true for those systems or environments that configure it that way.

Different vendors RAID software and hardware support various group sizes, chunking (amount of data written to each drive), and stripe size (number of drives). For example, a sixteen drive 15+1 RAID 5 configuration only has a parity space capacity overhead of 6%. RAID 5 remains popular for some environments and growing in adoption in the lower end SMB and consumer markets where fewer drives are the norm.

RAID 6

Similar to RAID 5, yet adding an additional parity to protect against a double drive failure by providing extra protection. RAID 6 has helped to support adoption of larger capacity 1TB, 2TB, 3TB and now 4TB drives and in wider stripes or larger RAID groups.

However, for some environments even more protection is needed beyond mirroring or replicating a RAID 6 group to another in a different storage system. So for those environments you can actually find RAID variations including RAID 7 (triple parity) along with hybrid RAID 10 & 01 (stripe with mirror, mirror with stripe), or 50 (RAID 5 with underlying stripes) not to mention emerging erasure codes, forward error correction, dispersal and other approaches.

Something else that is occurring as RAID continues to evolve is that the chunk sizes or amount of data written to each drive have evolved from 4K, 8K, 16K, 32K for many systems to much larger -- in some cases being several Mbytes (or more).

There is plenty more to revisit with RAID today, not to mention where this storage technology is going. We’ll take a further look at RAID in part two of this article.  

 

 

  This article was originally published on Monday Oct 28th 2013
Home
Mobile Site | Full Site