16 May 2016

A new iteration to make Erasure Coding universal

MemoScale (www.memoscale.com), a dedicated vendor to provide industry Erasure Coding software, appeared recently at the OpenStack Summit in Austin. I met the team and I immediately asked them if they're linked to the other Norway adventure named Splice Codes that tried to promote erasure coding some time ago. And the answer is yes. You can read a very short blog post I wrote about them in April 2015. It seems now that they changed a few things to be more visible with the help of the Norway government to financially support the project.
The company promotes 3 values with an erasure coding technique based on systematic mode: superior data integrity, very fast data recovery and super efficient storage capacity utilization thanks to 3 products:
  1. a C-library for Erasure Coding to be embedded with any storage engine. The team claims that that this library delivers 3x faster that classic Reed Solomon.
  2. a plugin for Ceph that combines erasure coding with error detection and correction. This plugin can replace Jerasure and ISA.
  3. a data integrity for RAID solutions.
The point 1 is illustrated by these 3 tables:

Data Protection methodStorage overhead
3x Replication3x
"Classic" Erasure Coding1.5x

Recovery SpeedMB/s
Erasure Coding13

Data FragmentsRedundancy fragmentsExtra Storage OverheadReed Solomon Erasure Coding recovery traffic*MemoScale recovery trafficMemoScale improvement factor in recovery traffic**
* Recovery traffic amount of data needed to be transferred during recovery compared to the lost data.
** Ratio between the amount of recovery traffic of RS vs. MemoScale. Higher improvement factor means better performance.

It's important to understand that erasure coding introduces some impact, it explains why for demanding platform, the presence of such techniques is limited versus its appearance in secondary storage. Today erasure coding is a must, having the feature doesn't create any advantage but not providing it is a clear drawback. Having said that, there is real differences across different erasure coding techniques.
MemoScale is a super interesting solution with lot of promises and it's good to see some innovation in a sector when during a long time we didn't find real ones meaning that Reed Solomon was the de-facto standard.
As we try to classify erasure coding techniques, it seems that the Mojette Transform invented in France at the university of Nantes is still the best approach followed by MemoScale and ISA-L and then Reed Solomon. The Mojette Transform is available in one of the fastest Scale-Out NAS of the market. Trying to use erasure coding technique for primary storage allows to find immediately the performance impact and the divergence with service commitments.
I invite you to try each of these to realize benefits and how they operate in your environment.

09 May 2016

IT Press Tour #19 - 6 years later

The IT Press Tour (www.itpresstour.com), launched in June 2010, will celebrate a team success for the 19th edition next June. So far, the tour has met 137 companies, delivered 200 sessions as some companies did multiple editions (for example, Cloudera did 5 editions, Nimble Storage, Pure Storage, Solidfire or Hortonworks, 4 editions...) and generated more than 5000 articles in online and print magazine. The tour has largely contributed to create visibility for invisible companies, its success speaks for itself.
The next edition end of June will be dedicated to IT Infrastructure and will take place in San Francisco and the valley. As usual, the tour team has invited many innovative companies, many of them are pioneers in their respective segment, and the list as of today is:
  • Cloudian, the best object storage for on-premise S3,
  • Datrium, the premier supplier of server-powered storage systems as they name themselves,
  • Hedvig, a new SDS approach by the father of Amazon Dynamo and Apache Cassandra,
  • Kaminario, one of the fastest All-Flash Array,
  • OpenIO, one of the few pioneers of object storage,
  • Portworx, a leader in software-defined storage for containerized applications,
  • Rozo Systems, a super fast scale-out NAS,
  • Springpath, innovator in hyperconvergence software,
  • Sysdig, first and only comprehensive set of container-native visibility solutions,
  • Versity, key player in data tiering for demanding environments,
  • and some confidential ones we can't mention today.
Follow the tour with the hashtag #ITPT, you can also follow dedicated Twitter handle @ITPressTour and other press group members. This tour will be exceptional... again... for everything related to storage.

25 April 2016

Synerway promotes Enterprise CDP

Synerway (www.synerway.com), one of the top french backup vendors, continues to promote enterprise data protection with advanced CDP. I wrote 10 years ago a short blog post about them (blog post in French July 2006) and also in 2010 as the product was clearly oriented in the CDP direction at that time (blog post in French May 2010). The company, founded in France in 2002 by former Quadratec leaders, well known for Time Navigator, has shipped so far 15000+ appliances via a pure 100% channel model.

Synerway Enterprise CDP is a block based solution that captures in real-time data changes and ships them to a disk attached to a dedicated appliance. An agent must be installed to make this capture and of course, a first copy is needed that serves as a reference. The granularity is excellent as the capture can work from 512 bytes which is also a key optimization feature different from competitions who capture data from 1kB to 128kB. The product can manage up to 1000 snapshots, so 1000 versions possible, largely enough to control all data critical environments. There is no catalog and the agent fee is capacity based without any limits on the number of instances deployed. In the Windows environment, the agent is compatible with VSS. On the target, the product controls data and storage with thin provisioning, compression and encryption. There is a stress point as Synerway must develop, validate and follow various OSes with long matrices. Synerway also offered FileSafe and DiskSafe, 2 options from FalconStor.
It exists 3 approaches - In-Band, Out-of-Band and Side-Band - and this approach reminds me 2 Side-Band approach blog posts I wrote in Dec. 2005 (2 posts in French available here and here). In the classic Side-Band mode, it is used without any intrusive agent with just a logical volume manager and a dedicated path with an appliance that stores and versions data. For people who work in this industry for more than 10 years, this approach seems similar to what Revivio did, then the company got acquired by Symantec and offered the feature via NetBackup RealTime. You can check also a pretty blog post I made in January 2006, still in French.
Synerway seems to position Enterprise CDP in front of DeDuplication but these 2 technologies are more extension and not alternatives. Even with Compression on the Synerway side, users can perfectly use DeDup in target mode coupled with CDP. CDP targets RPO - Recovery Point Objectives - needs and its mission it's not to reduce storage space and DeDup is chosen for the storage capacity economy it can generate, or capability to use low network bandwidth for remote backup for instance. However, Synerway has a pretty interesting approach to both satisfy RPO requirements and storage space and their success proves that the solution has a real value.

15 March 2016

Cohesity + Pure Storage, a clever couple

Cohesity (www.cohesity.com) and Pure Storage (www.purestorage.com | NYSE:PSTG), 2 leaders and innovators in data storage, just announced a partnership. In 2 words, Pure Storage with All-Flash Array line serves as primary storage of course and Cohesity C2000 harmonizes the secondary data landscape. One of the value users understand when they think about adopting these 2 solutions reside in the choice of something really new and not an extension or addition or something existent with classic vendors or at least classic product with new capabilities. Both of them use Flash + deduplication and contribute drastically to a low TCO by reducing footprint and energy while serving and protecting data better.

The solution is composed by 2 elements: a 3U Pure Storage FlashArray and a 2U Cohesity C2000 appliance and si pretty simple to integrate via VMware API, so finally each VM protection policy triggers a data flow towards the Cohesity device. Next step seems to be an integration based on Pure Storage snapshots mechanism to be stored directly on C2000 appliances. Both companies organizes a joint webcast soon to explain this deployment, you can easily register here.

07 March 2016

IT Press Tour #18

The IT Press Tour (www.itpresstour.com), the leading media event in US for EMEA press and medias, will organize the18th edition dedicated to Business and IT Applications from March 21st to 24th in San Francisco. As usual, a group of European reporters will converge to San Francisco to meet and visit several hot companies in Hadoop, Big Data, Analytics, Applications Performance Management, Corporate Planning, Mobile CRM and Messaging oriented applications, check the list, they're all hot companies with unique technologies:
  • Anaplan, leader in corporate planning with a Cloud-based SaaS model,
  • AppDynamics, reference in Application Performance Management,
  • Chartio, recent player in Cloud Analytics,
  • Datameer, top player in big data analytics and Hadoop,
  • FollowAnalytics, promoter of mobile CRM and user engagement,
  • Hortonworks, pioneer of Hadoop and big data processing platform,
  • Phala Data, promoter of analytics to serve corporate business revenue,
  • SendBird, startup dedicated to messaging and chat APIs for mobile Apps,
  • Sherbit, interesting solution for personal analytics,
  • and Trifacta, leader in big data analytics and exploration.
Follow the tour with the hashtag #ITPT, you can also follow dedicated Twitter handle @ITPressTour and other press group members. This tour will be exceptional... again...

02 March 2016

Hortonworks introduces CDP

Hortonworks (www.hortonworks.com | NASD:HDP), leader in Hadoop and Big Data processing platform, continues to shake the market with a major announcement made yesterday in San Francisco. Hortonworks promotes its Connected Data Platforms aka CDP (not Continuous Data Protection) to offer best-in-class Data-at-Rest and Data-In-Motion processing. To deliver this promise, the company unveiled 2 new iterations:
  • Hortonworks Data Platform aka HDP, the historical product, today under release 2.4 which includes Apache Spark 1.6, Apache Ambari 2.2 and SmartSense 1.2. This is also the perfect opportunity to announce a new release approach with Core Apache Hadoop components (HDFS, MapReduce and YARN) and Apache Zookeeper to be updated annually and aligned with ODPi consortium and the Extended Services (Spark, Hive, HBase, Ambari....), running on top of the Core, top be released in a continuous manner.
  • and Hortonworks Data Flow aka HDF today in 1.2, leading real-time streaming data platform, key for IoT, now includes Apache Kafka and Apache Storm. HDF 1.2 will be available in Q1 2016.

Last key point of this major announcement, a strategic partnership with HPE to boost and optimize enterprise Spark performance. HPE makes a clear move in favor of Hortonworks as the preferred partner and solution for all big data environments and needs.
For the 4th time, Hortonworks will participate to the IT Press Tour in 2 weeks now and we'll learn more about these news. Great.

12 February 2016

CDP v2, new iteration from Cohesity

Cohesity (www.cohesity.com), leader in the new Converged Secondary Storage landscape, moves fast and just announced a second generation of its CDP solution. This iteration targets data protection of course but also test/dev, file services and analytics. Thanks to its SnapTree technology, Cohesity enhances OASIS, the heart of the system, with some new capabilities:
  • Data throttling, QoS, Encryption AES 256 for data at-rest,
  • Site-to-site replication with 1-1, 1-N and N-1 modes,
  • A more comprehensive cloud archive function now supporting Amazon Glacier, Microsoft Azure and Google Nearline in addition to Amazon S3 and S3-based object storage system but also a tape extension,
  • SMB 2.x and 3.0
  • and AWB - Analytics WorkBench - to gain insights with pre-built AWB applications and possibility to build your own applications.
At the same tine, Cohesity adjusts the pricing of the solution now starting at $90k. The company accelerates and continues to promote a unified secondary storage platform with data processing and not a passive platform to just store data. Again, vendors who are just doing this will have rapid difficulties...

04 January 2016

Dan Warmenhoven joins Cohesity's BoD

Cohesity (www.cohesity.com), innovator in secondary storage, continues to build and strengthen his leadership and board team. This time Dan Warmenhoven, long time NetApp CEO, joins its Board of Director. This arrival solidifies Cohesity and helps actively its market adherence. Super recruitment, good move Mohit.

15 December 2015

New Data Protection approach with Rubrik

Rubrik (www.rubrik.com), innovative startup dedicated to data protection, is one of the few vendors that recognizes that the secondary storage didn't really progress for at least a decade. In other words, does the industry really invent something since the Data Reduction wave with DeDuplication and Compression ? For sure we all remember some famous examples such Ocarina Networks (acquired by Dell in 2010), Data Domain (acq. by EMC in 2009 for $2.4B), Diligent (acq. by IBM in 2008 for $210M), Avamar (acq. by EMC in 2006 for $165M), Alacritus (acq. by NetApp in 2005 for $11M) or DataCenter Technologies (acq. by Veritas in 2005). But after that, nothing or just the dedup integrated a a feature to classic backup software or appliance. Done. I share Rubrik's view, classic products still exist but infrastructures now store more data with many different flavors of production environments, bare-metal or hypervisors based, and in many cases, use a legacy backup software is obsolete. You start a backup job but it can't finish because data sets are too big, too distributed, or application consistency is super tough to implement... in a few words, data protection need a new approach and Rubrik plays its role in that adventure.
Back to the genesis of Rubrik, the company was founded in 2014 by top engineers and leaders from Google, Facebook, VMware or Data Domain - Arvind Jain, Soham Mazumdar, Arvind Nithrakashyap - and Bipul Sinha, CEO and founder as well from Lightspeed Venture Partner. They have all in common that they understand the need, are motivated to change that landscape and know, a priori, how to build it. The company raised 2 rounds of money for a total of $51M, Series A was $10M by LSVP in March 2015 and Series B was $41M from LSVP and Greylock Ventures in May 2015. It's interesting to see also Mark Leslie, former CEO of Veritas Software, Frank Slootman, former CEO of Data Domain, and John W. Thompson former CEO of Symantec, who have made some significant investments to bootstrap the company. Rubrik is on a mission: To take out Backup software from the Enterprise. But to be serious, it's more than that, Backup is just a flavor of data protection and users, enterprises... don't really care about the product or the technology, in fact, what they need is an answer to their Data Protection challenge, just that. Historically, protect data was solved by backup software, but we all know that snapshot, versioning, cloning, replication, archival... are all data protection modes or flavors that all belong to the category. So now, you get the point, it's not about Backup, it's about Data Protection whatever is the (above) method used, users don't car, they care about their data. That's it. The notion always associated with data protection, at least it should be, is relates to RPO - Recovery Point Objectives - and RTO - Recovery Time Objectives - and to be serious, seen by users and applications not from the IT guy. RPO is essentially the quantity of data you accept to loose and this term refers to things before an event such a failure... This is perfectly illustrated with a daily backup job running everyday at 10pm, between jobs, nothing happens so IT accepts, if it doesn't do other things, to loose maximum 10 hours of work if we consider a failure just before the start a second job. RPO is key but for different applications and needs, you can consider different RPOs. A static web site backuped once a month or when changes occur is largely enough so in that case a RPO could be measured in days, weeks or months. For source code and development stuff, the company doesn't want to loose any good code written by its team, so a strict RPO is often in place. And finally imagine an activity with revenue, the RPO is also very strict. Now the RTO is different, if we reconsider these examples, we can imagine 3 different RTOs. As soon as you discover that your web site is down, you wish to recover and restart the site somewhere so in that case the RTO should be short. For development with no associated revenue, you can consider a flexible RTO in days or weeks and for the business transaction, of course the service must be protected with a very short RTO in mind. This RPO/RTO approach help everyone to map needs, requirements and the solution you need in each identified case.
So what is Rubrik about ? in a nutshell, Rubrik is about Converged Data Management, gathering together all flavors of data protection techniques in ONE product sold as an appliance to be easy to sell, to deploy, to use, to scale and to maintain. The expertise of Rubrik is in software but again to facilitate sales and the channel, an appliance model is perfect. The philosophy is to be non intrusive, there is no agent installed on VM or at the hypervisor level, the scalability is automatic, plug a new node and it will join the cluster. You receive a box and after 15 minutes the solution is running in production. To protection information just define a policy with SLA and run it.
The 2U appliance itself exists in 2 models, the R344 and the R348, with great resources in CPU, RAM, SSD, Network... and finally differ from the HDD capacity they offer: 48TB for the R344 and 96TB for the R348. Rubrik is an hybrid storage device as it includes SSD and HDD and you understand that each 2U chassis has 4 servers in it, it's the reason why things work by 4 in the spec sheet. The appliance exposes its storage via NFS or iSCSI and is managed through a REST API. In term of functions, Compression, Deduplication, Replication, Indexing, Snapshot, Encryption, Reporting, Super Granular Recovery... and a super easy and fast search mechanism (of course when we say indexing we implicitly associate search capability) are delivered by the system. For instance, a model like the R348 can easily managed 300 to 400 VMs. For long term data preservation, Rubrik relies on Amazon today and in the future Azure and Google but they don't offer by themselves a cold storage approach, Cold is Cloud for them. But you can also connected Rubrik to external object storage via S3 and even write to a NFS store. Also today the product is limited to VMware (vSphere 5.1, 5.5 and 6.0) but the market is so large that the limitation is more an advantage to stay focus. The model is 100% channel and start at $100k. The solution is superb, honestly, it's good to see people who have designed a radical new solution and solved the problem differently in favor of simplification. Success will come naturally when you have such product, it's just a term of execution but leaders pay attention to that. The product will flood the market, no doubt. Interesting to see that Cohesity and Rubrik share some ideas and participate to the revolution of secondary storage.

11 December 2015

Cohesity to change storage positions

Cohesity (www.cohesity.com), leader in the new wave of unification of secondary storage, announced recently their first product iteration. The session during The IT Press Tour last week helped all of us to dig and understand better the strategy and the product while feeling the atmosphere in their office. And we had the privilege to speak with Mohit Aron the CEO and founder of the company, who already founded Nutanix, an other famous shop. In term of company finance situation, Cohesity closed 2 rounds - A at $15M led by Sequoia and Wing Venture Capital in November 2013 and B at $55M led by Artis Ventures and Qualcomm Ventures with Accel, Battery, Google and Trinity in June 2015. Wow, all this to accelerate the development and the market penetration. The total raised is now $70M but it's interesting to see the number of VC "motivated" by this project. Cohesity seems to be Hyper Compelling.
I already covered Cohesity very shortly as we didn't have lot of infos, it was in September 2014, just a few months after they got created, and you can read the short blog post here. You can even discover the previous logo.

So what is Cohesity all about ? What are the challenges they wish to solve and how the solution works ?
The first element is to consider that the biggest volume of data are stored on secondary storage, meaning by storage not operating for the business directly such backup images, archive data sets, copies (replication, snapshots, clones...) for DR... but all this, again, represents the largest bulk of data. These data that represent the business at a moment of time - pretty frequent for snapshot and less frequent with classic copies - are also not at all mined or analyzed but everything is there. Again, around 80% of corporate data sit in secondary storage. So the idea of Mohit Aron was and is still to offer in one data platform all the data workflows associated with data protection - backup, archive, snapshot, clones, replication, DR... - but also development and analytics. They call it the Cohesity Data Platform, perfect on this blog that historically was started at the age of CDP (Continuous Data Protection).
In term of product, Cohesity develops software but sells appliance, the C2000 line, to continue in the idea of simplicity. Physically the CDP is a 2U chassis with 4 nodes and users can start with 3 nodes. 2 models exist: C2300 and C2500 with respectively 48TB HDD and 3.2TB PCIe SSD and 96TB HDD and 6.4TB PCIe SSD and both have 8 10GbE ports - 2 ports per node. Cohesity builds 3 software layers: OASIS (Open Architecture for Scalable Intelligent Storage), Storage Services and an Application Environment. The product is managed via a very intuitive GUI and CLI and can be integrated with a REST API. With a strong DNA in distributed software and in particular file system, the team develops SnapFS, a shared-nothing strongly consistent distributed file system that span every node, that represents the core of the solution.

For data services, Cohesity provides snapshots & clones, global deduplication (inline and post with 8-16KB granularity), replication (tunable but by default 2 copies), auto-tiering in both directions (SSD <-> HDD) and non disruptive operations. Erasure Coding will be ready in 2016. Analytics is one of the key value of the platform and things like indexed backups, search capabilities and reports complete the picture. Above, Cohesity expose SnapFS via different protocols such distributed NFSv3, a RESTful API and of course the ability to connect to VMware instances via VADP. SMB, iSCSI and HDFS will be offered laer. The solution is of course infinitely scalable, will receive more and more features soon, and is today positioned as a mid-market environment with VMware and is priced below $100K. In fact many primary storage vendor could dream about these features set. Cohesity really changes the landscape of Data Protection and participate to the new adventure of Converged Secondary Storage with players like Rubrik. It's no more just storing data it's about storing and processing data on the same platform.