“Plus ça change, plus c’est la même”

“The more things change, the more they stay the same”

It seems ironic, well into the 21st century, that this phrase first written in 1849 still rings true to today.  But, when it comes to backup and recovery it’s spot on.  One of the longest running discussion threads in the backup world is explaining why “replication is not backup.”

In the early days of backup when applications were built predominantly on relational databases and were hosted on-premises in customers’ data centers, those of us selling backup solutions constantly heard “we don’t need backup because we replicate our data.” They thought they were protected by creating copies of production data, on disk, replicated by disk arrays from a primary data site to a D/R site.  To an extent, they were.  If the primary site failed or went offline for whatever a reason, production could resiliently continue at D/R site.  But replication, in fact, also quickly propagated data loss; accidentally delete data, BAM!, that accidental deletion is rapidly and effectively replicated!  Whatever the cause, if data becomes corrupted it gets replicated across multiple sites.  That’s bad.  Replication is NOT backup.

Whether it was before or after they suffered an unrecoverable data loss, customers eventually understood the reality that array-based replication enabled operational resiliency but was not “backup.”  Backup solutions were required to deliver point in time, application consistent versions of data to enable rapid, point-in-time recovery in the event of data loss or corruption.   The perfect complement to replication.

Forward “back to the future” in our I/T Delorean, and here we are in 2017 and “the more things change, the more they stay the same.”  The world of data has changed dramatically in the intervening years.  Rapid proliferation of hybrid cloud infrastructures and next-generation applications built upon cloud-scale no-SQL databases like MongoDB and Cassandra storing petabytes of data.

Still, we hear “our database uses replication, we don’t need backups.” Next-generation scale-out databases do employ multi-node replication to guarantee operational resiliency.  Even with the failure of 1 or more nodes in Cassandra or MongoDB cluster transaction processing and application data access continues on, uninterrupted, without a hiccup.  But, next generation databases suffer the same inherent limitations of replication — accidental deletion of  one or more data tables, fat finger errors, ransomware attacks all can result in unrecoverable data loss.  

Replication is still NOT backup.

In fact, the clustered replication that makes next generation databases so resilient also makes them even more reliant upon backup to protect against data loss, corruption, and ransomware attack.  But, traditional backup solutions  that depend on silos of media servers and backup appliances, and store backups in a proprietary format on dedicated storage don’t address the application consistency and scalability requirements of next generation databases.

That’s why we created Datos IO RecoverX, the industry’s first cloud-scale, application-centric, software-only data protection solution that delivers scalable and fully featured point-in-time backup and recovery for next-generation applications built on noSQL databases including MongoDB and Cassandra, as well as big data filesystems.

To learn more, spend some time here