One of the truths of all technologies is:  what was once the best solution for a then current need eventually becomes an obsolete legacy. Media server based legacy backup architecture has no place in the cloud.  Likewise legacy media server based architectures have no place protecting next generation scale-out data.   

Legacy backup solutions from long established backup vendors (e,g, Veritas, EMC/Legato, CommVault), are all based on the same legacy architecture:  single vendor, end to end architectures which centralize the control of the backup process (the “control plane”) and moving/storing the backed-up data (the “data plane”).   In these legacy backup solutions “media servers” act as the consolidated “control plane” and “data plane” as shown below:

Media server based backup architectures were designed to backup and protect the data sources predominant when legacy backup solutions reached their maturity, mainly:  

  • Applications based on hierarchical SQL Databases (E.g Oracle and Microsoft SQL)
  • Unstructured files, hosted on specific dedicated servers.
  • Data that resided in on-premises data centers

Each backup vendor’s unique client “agent” software,  installed on each host, used database specific APIs to invoke and process backups of structured data (e.g. Oracle RMAN).   Likewise client agents installed on each file server processed backing up unstructured files on each file server or used APIs on each NAS (an appliance based dedicated file server).

For backing up data in these scenarios the agent software identified the data to be backed-up,  under control of backup administration issued from a media server  (control plane).  And then moved the backed-up data (data plane) to an assigned media server, which stored the backup in proprietary format on some kind of dedicated storage (Tape, Disk, Purpose Built Backup Appliance storage).  Specific media servers backed up specific hosts or clients.  Adding more database hosts or file servers eventually required adding more media servers.  The inevitable result  of  this architecture was silos of groups of hosts and a specific media server protecting them.  

Non-relational scale out data sources, such as MongoDB and Apache Cassandra as well as distributed Big Data filesystems are increasing in popularity and use.  In comparison to legacy hierarchical SQL databases they are natively clustered, scalable, resilient, highly available, and significantly more cost effective to license and deploy in comparison to their legacy predecessors.   However, the aspects of their architecture that make them resilient and highly available also defy backup using legacy media server based architectures.

In comparison to traditional relational databases, data is distributed across multiple nodes.  Referred to as “sharding,” no one node contains all the data in any table or database.  Also, In comparison to traditional relational databases there are copies of every data object/transaction “replicated” across multiple nodes; data is automatically stored multiple times.  The architecture of “replicating” multiple copies of data across multiple nodes is what makes scale out, non-relational databases so resilient and fault tolerant.  Finally, compared to traditional relational databases there is no point in time consistency of data during operation.  Data is “eventually” consistent.

It is these 3 aspects:  data distributed across multiple nodes, multiple copies of data stored in each database cluster, and eventually consistent data that make legacy media server based backup solutions unsuitable for protecting scale out non relational data.  Media server backup architectures were designed to backup specific, consistent, data stored on specific data source hosts.  They are unsuited and unscalable for the parallelism required to backup these data sources. Likewise, they don’t have an effective technology to eliminate the redundancy of data that is resistant to block deduplication because it is compressed and bit-wise non-identical.

That’s why we created Datos IO RecoverX, the industry’s first application-centric, software-only data protection solution that delivers scalable and fully featured application-consistent point-in-time backup and recovery for next-generation applications built on scale out non-relational databases and big data filesystems.

  • RecoverX does not utilize media servers – the backup “control plane” is separated from backup “data plane”
  • During backup, data is streamed directly from source data nodes, in parallel, to cost effective, secondary storage (e.g. S3 in AWS) and is stored in native format.
  • Using patented Semantic Deduplication, RecoverX deduplicates protected data, without interrupting the backup streamed from the nodes.
  • Subsequent backups after a first backup only stream incremental new data to the secondary storage
  • Patented technology delivers point-in-time consistent data protection for eventually consistent data sources
  • Completed backups use a fraction of the space of the total data backed up.  

Lack of effective data protection can quickly derail your application deployments and put you at risk of data loss and downtime.  Make sure you don’t get stuck with a legacy architecture using yesterday’s solutions when trying to backup today’s data sources.

Click here to learn more about RecoverX