Inside Else Inside TEMPlate====>
 

Storage Enters the Age of Erasure Coding

By Paul Rubens

Data storage vendors are getting serious about erasure coding as the limitations of RAID become more pronounced.

The inefficiencies of RAID and replication mean the time has finally come for erasure coding-based data protection

Erasure coding is a storage technology that's about to explode on to the storage mainstream.

Its appeals are obvious: it's a data protection system that's more space efficient than straight replication, and one which tolerates more faults and allows you to recover lost data far more quickly than is possible with traditional RAID  systems.

Here are just a few examples of storage offerings that are getting serious about the technology: Intel and Cloudera are developing erasure coding in HDFS for release in Hadoop 3.0, and Nutanix has begun showing off its own proprietary erasure coding called EC-X in the current versions of its Nutanix OS in preparation for its launch in NOS 5. Ceph, the open source software storage platform, introduced erasure coding last year with the Firefly (v0.80) release, and erasure coding is at the heart of Cleversafe's dispersed storage systems. (Earlier this month IBM announced that it had acquired Cleversafe for an undisclosed sum.)  

Erasure Coding: Why Now?

Erasure coding is not new – it's been around for over 50 years – but one reason that the time for erasure coding may finally have come boils down to the fact that enterprises are accumulating and storing vast (and rapidly increasing) amounts of data every day. That means space efficiency is becoming more important.

A platform like Hadoop typically provides data protection through replication:  three copies of each piece of data are stored on different cluster nodes. The problem is that a Hadoop system may be storing many terabytes of information, and that makes replication very expensive. That's because this level of replication has a storage efficiency of just 33%: you can only use 33% of your storage capacity to store data – the other 67% is then used up by replicas of this data.

By contrast – as we shall see – some erasure coding schemes offer storage efficiency as high as 71% while offering an even greater level of data protection.

There are other factors driving the adoption of erasure coding too. One is that Moore's Law has ensured that the processing overhead required to operate erasure coding is rapidly becoming insignificant as processing power becomes cheaper and more abundant.

And Scott Sinclair, a storage analyst at Enterprise Strategy Group, also identifies the trend toward software defined storage running on commodity hardware as another important driver.

"Custom storage hardware is more expensive but more resilient than standard hardware, which is not designed to have a single point of failure," he says. "To cope with this some software defined storage solutions use replicas across nodes so that if one node goes down another can take over, but this is very inefficient. So they are taking advantage of the processor gains in standard server hardware to use erasure coding with software defined storage."

RAID Problems

RAID systems are also designed to overcome the inefficiency of replication. But the vast amounts of data that enterprises are accumulating are increasingly being stored on very high capacity disks – in some cases 10TB drives – and this causes a number of different problems for RAID systems.

First, high capacity drives are more likely to suffer bit errors as there are more bits stored on them. When errors lead to a RAID rebuild there's the problem of reduced or no data protection if another disk in the RAID array fails before the rebuild is complete. And another failure is more likely since the disks have such high capacity.

Second, it was never intended that RAID be used with such high capacity disks. Since capacities have grown far faster than data transfer rates to and from disks, this means that rebuild times can now take many hours and days.

Sinclair points out that RAID also offers far less storage flexibility than erasure coding. "With RAID 6, you take your disks and say 'these disks are in RAID 6,'" he explains. "But with erasure coding you can be more flexible and say 'this virtual pool has this protection model - you can abstract from the hardware.'

He adds that erasure coding also lets you to scale larger than the inefficiencies of RAID will allow. Replicas can do this too, but with replicas you need far more storage space.

How Erasure Coding Works

Erasure coding works by splitting a file in to a number of equally sized pieces, and then doing some fancy mathematics encoding to produce a larger number of pieces. For example, you could start with a single file, split it in to 6 pieces, and then do the encoding to produce 10 pieces.

What's clever about the encoding is that you would only need 6 of the 10 encoded pieces to get back to the original file ­–­ you can lose any four and without resulting in any data loss.

To get an idea of how EC works, let's look at a very simple example where you split a file into 2 pieces, and then encode those in to 4 encoded pieces.

So we start with a single file, split it into 2 pieces which we'll call P1 and P2, and then encode those into 4 encoded pieces EP1, EP2, EP3 and EP4

Now let's imagine that EP1 is simply a copy of P1 and EP2 is simply a copy of P2. So far so simple.

To generate two extra encoded pieces, EP3 is P1 + P2, and EP4 is P1 + (2xP2)

So what happens if two if these encoded pieces, EP2 and EP4 are lost?

We are left with EP1 and EP3, and we know that EP1 is identical to P1, and EP3 is simply P1 +P2. So with a little mathematical equation solving it is possible to get the original file back from just these two encoded pieces.

That's the principal. In fact erasure coding is more complex than that. A common form of erasure coding is called Reed-Solomon (RS) erasure coding, invented in 1960 at MIT Lincoln Laboratory by Irving S. Reed and Gustave Solomon. It uses linear algebra operations to generate extra encoded pieces, and can be configured in different ways so that a file is split in to k pieces, and then encoded to produce an extra m encoded pieces which are effectively parity pieces.

That means with RS (k,m) you can lose any m encoded pieces out of the total (k+m) encoded pieces, and still reassemble the original file.

Typical RS configurations are RS(6,3) and RS(10,4), meaning any 3 pieces of 9 for RS(6,3) and any 4 pieces for RS(6,4) can be lost without losing any data.

So let's take a look at the efficiency of these Reed Solomon EC systems compared to simple Hadoop-style replication and no replication at all:

Single copy of data:  no failures can be tolerated, 100% storage efficiency

Triple replication: 2 failures can be tolerated, 33% storage efficiency

RS(6,3): 3 failures can be tolerated, 67% storage efficiency

RS(10,4): 4 failures can be tolerated, 71% storage efficiency

What's pretty clear then is that although triple replication sounds like it should be a very effective form of data protection – after all you have two extra copies of the data – it turns out that a system such as RS(10,4) may be far better. That's because it offers more protection – 4 failures rather than 3 – and because it is far more storage efficient, allowing you to store much more original data in any given storage resource.

The downside of EC is that processing data in to encoded pieces and reassembling them when the data is needed takes processing power, as mentioned earlier. It can also introduce an element of latency compared to reading a file off a single disk – especially when pieces are stored in geographically remote locations.

However, it is possible to mitigate against this latency problem to some extent, Sinclair says. "Some implementations of erasure coding use WAN optimization to speed up how bits are moved across the wire," he says. "Other solutions build a scheme so that if they suffer one or two drive failures they can rebuild across local servers, while they can still cope with a more general site failure, although recovering will be slower."

Companies such as Nutanix are introducing their own proprietary erasure coding systems rather than using Reed-Solomon, and these are likely to be optimized for their storage software and may include WAN optimization as well.

So where is erasure coding most likely to be used? Today it's most common in massive capacity storage environments with large objects, says Sinclair.

"In these massive active archives erasure coding is great, but as we continue to benefit from Moore's Law we will see more organizations look to commodity hardware and software defined storage," he says. "When that happens I think erasure coding will move in to transactional and virtual workloads."

One of the places where erasure coding is less likely to be adopted is in all-flash storage spaces in high performance, mission critical environments, Sinclair believes. "In terms of capacity, all-flash environments may be smaller environments where the challenges of RAID are not as severe," he says.

He adds that the differing failure rates of solid state media compared to spinning disks may also mean that erasure coding does not bring the same benefits, and also observes that the faster performance of flash storage may make the latency issues of geographically distributed erasure coding less important.

One final thought: the main drawback to replication is that it is very storage inefficient, but the falling cost of storage capacity could eventually make this a non-issue. To match the data durability of RS (10,4) which can tolerate four device failures, you would have to use four fold replication – but if the cost of storage becomes negligible  then the benefit of erasure coding is less clear cut.

So could erasure coding be a technological dead end?

Sinclair thinks that's unlikely, and the reason comes back to Moore's Law:  storage costs may be falling, but the cost of computing resources is falling too.

So if the cost of storage and compute resources both become negligible then there is no direct cost-based reason to favor replication over erasure coding or the other way round.

In that case then erasure coding may still end up being preferable  – if only because it's more efficient in terms of basic requirements like power and rack space.

Photo courtesy of Shutterstock.

  This article was originally published on Tuesday Dec 1st 2015
Home
Mobile Site | Full Site