Monday, May 16, 2016

A new iteration to make Erasure Coding universal

MemoScale (www.memoscale.com), a dedicated vendor to provide industry Erasure Coding software, appeared recently at the OpenStack Summit in Austin. I met the team and I immediately asked them if they're linked to the other Norway adventure named Splice Codes that tried to promote erasure coding some time ago.
And the answer is yes. You can read a very short blog post I wrote about them in April 2015. It seems now that they changed a few things to be more visible with the help of the Norway government to financially support the project.
The company promotes 3 values with an erasure coding technique based on systematic mode: superior data integrity, very fast data recovery and super efficient storage capacity utilization thanks to 3 products:
  1. a C-library for Erasure Coding to be embedded with any storage engine. The team claims that that this library delivers 3x faster that classic Reed Solomon.
  2. a plugin for Ceph that combines erasure coding with error detection and correction. This plugin can replace Jerasure and ISA.
  3. a data integrity for RAID solutions.
The point 1 is illustrated by these 3 tables:

Data Protection methodStorage overhead
3x Replication3x
"Classic" Erasure Coding1.5x
MemoScale1.2x

Recovery SpeedMB/s
Erasure Coding13
MemoScale34

Data FragmentsRedundancy fragmentsExtra Storage OverheadReed Solomon Erasure Coding recovery traffic*MemoScale recovery trafficMemoScale improvement factor in recovery traffic**
6350%6x2.67x2.25
8337.5%8x3.67x2.18
10440%10x3.85x2.60
16425%16x5.96x2.68
20420%20x7.54x2.65
* Recovery traffic amount of data needed to be transferred during recovery compared to the lost data.
** Ratio between the amount of recovery traffic of RS vs. MemoScale. Higher improvement factor means better performance.


It's important to understand that erasure coding introduces some impact, it explains why for demanding platform, the presence of such techniques is limited versus its appearance in secondary storage. Today erasure coding is a must, having the feature doesn't create any advantage but not providing it is a clear drawback. Having said that, there is real differences across different erasure coding techniques.
MemoScale is a super interesting solution with lot of promises and it's good to see some innovation in a sector when during a long time we didn't find real ones meaning that Reed Solomon was the de-facto standard.
As we try to classify erasure coding techniques, it seems that the Mojette Transform invented in France at the university of Nantes is still the best approach followed by MemoScale and ISA-L and then Reed Solomon. The Mojette Transform is available in one of the fastest Scale-Out NAS of the market. Trying to use erasure coding technique for primary storage allows to find immediately the performance impact and the divergence with service commitments.
I invite you to try each of these to realize benefits and how they operate in your environment.
Share:

0 commentaires: