With rapid market adoption of modern applications built on next generation scale out, distributed NoSQL and big data data platforms traditional backup vendors are increasingly focused on these new workloads – at least from a messaging and roadmap standpoint.  While traditional vendors help increase customer awareness of the critical need for backup of modern data data platforms (e.g. Apache Cassandra, MongoDB, HDFS, Couchbase) it’s more important than ever to separate fact from fiction when it comes to selecting a vendor.  As we have discussed in previous blogs the requirements for backup and recovery of modern applications are fundamentally different than traditional backup.  So let’s look at some of the recent ‘news’ and see how it stacks up against these requirements.

As discussed in a previous blog, media server based architectures render legacy solutions impractical for protecting modern next generation data sources.  One legacy vendor has recently announced a “parallel data streaming” feature as part of their “parallel data streaming architecture.”  However, when you peel back the onion what they’re really doing is now enabling parallel backup streams from individual nodes of clustered data sources to a “backup host,” i.e. a media server; a media server connected to its own storage.  And claim that that scaling up the this data protection solution is enabled by “adding backup hosts.”  In a cloud centric world adding more hosts, with more dedicated primary storage each for each host is an expensive scenario.



Another legacy vendor has, delivered an “early feature release” for backing up Apache Cassandra. The solution doesn’t have its own native feature for backing up Cassandra incrementally, instead relying on enabling “incremental backup” in the .yaml file of each cassandra host.   Unfortunately, this solution requires configuration changes in the Cassandra.yaml file (the configuration file) on each cassandra host. This is an intrusive requirement because making changes in the .yaml file of a cassandra host requires shutting it down and restarting the cassandra service to apply the change.  Simply put, it’s a hack.

Furthermore, the solution requires further manual configuration if a Cassandra node is replaced or a new one added.  This vendor also recommends disabling incremental backups on cassandra nodes before restore. Again more manual steps.

Additionally, like the previous example this is still a media server centric solution which utilizes silos of media servers attached to their own storage, writing the data protected in a unique format.  

In a 3rd example a legacy vendor has partnered with a scale out database vendor to enable their Purpose Built Backup Appliance (PBBA) as a target for backups from the scale out database vendor’s backup solution. The database vendor’s native backup solution is a media server that hosts copies of the databases that it protects.

In the  joint solution the media server writes database dumps as point in time snapshots  to the legacy vendor’s network attached PBBA.  The PBBA  is delivered  as an on premise proprietary server with its own storage or as a  virtual machine with its own dedicated primary storage. The biggest disadvantage of this solution is that it results in 2 silos of constrained scalability: compute and storage for each media server silo and for each PBBA silo.


The media server is subject to limits of how much data it can protect before requiring another media server and the PBBA has its own limits of total backup throughput and storage.   In addition to scalability issues this joint solution requires disabling compression of the backup written by the Database media server so the PBBA can use its own block based deduplication to reduce data footprint on its disk storage. This results in increased time to backup to accommodate writing the 2-3x the size of the compressed data to the PBBA target.  

The bottom line is that these solutions all suffer from the same malady. These vendors promise to solve the protection requirements of next generation data sources with yesterday’s technology. Scaling inevitably requires adding more hardware or more virtual machines and adding more primary storage.   

Don’t believe the hype of legacy vendors that they can solve the challenges of protecting today’s data sources with their legacy architectures. They can’t, but Datos IO can.

Click here to learn more about Datos IO RecoverX