Since announcing support for HDFS with the release of RecoverX 2.0 earlier this year, one of the most frequent comments we get from customers and prospects is ‘I don’t need backup and recovery for my Hadoop platforms’.  Unfortunately, that could not be further from the truth.

Although HDFS filesystems offer replication and local snapshots, they lack the point-in-time backup and recovery capabilities required to achieve and maintain enterprise-grade data protection.  As enterprises increasingly rely on big data applications for decision support and customer analytics, it’s critically important to understand the need for backup and recovery of Hadoop environments.  So, here’s a quick primer on fives reasons you need to backup your Hadoop environment:

  1. Replication is not same as point-in-time backup. Replication provides high-availability but no protection from logical or human error that can result in data loss, and ultimately result in lack of meeting compliance and governance standards.
  2. Data loss is as real as it always was. Studies suggest that more than 70 percent of data loss events are triggered due to human errors.  Filesystems such as HDFS do not offer protection from such accidental deletion of data.
  3. Reconstruction of data is too expensive.  It’s theoretically possible to reconstruct data from the respective data sources but in practice the data itself is either lost at the source or reconstruction takes weeks or months.
  4. Application downtime should be minimized. There is no value in data when it can’t be accessed.  Granular file-level recovery is essential to minimize any application downtime.
  5. Cost. Big data is…BIG with data lakes quickly growing to multi-petabyte scale.  Enterprise backup and recovery also enable organizations to archive data to cost-effective object storage systems.

To help spread the word about the critical need for enterprise backup and recovery for Hadoop platforms, I recently contributed an article to InsideBigData.  You can read that article in its entirety here:


If you’d like to learn more about how Datos IO supports Hadoop backup and recovery, check out these useful resources:

What’s New in Recoverx 2.0:

Hadoop Solutions Overview: