MongoDB is a leading choice for organizations deploying next generation, cloud native, scalable, resilient, and highly available non-relational databases.  While MongoDB includes native backup tools such as Mongodump (native database dump utility) and Ops Manager (managed backup solution), there are pitfalls to consider as you solidify your enterprise backup strategy.  Let’s look at each of these pitfalls to find an ideal backup and recovery solution.


Pitfall #1 – Application consistent enterprise backup is not achievable with basic database dump utilities

Using a manual database dump utility, such as Mongodump, combined with homegrown scripting (e.g hand coded scripts) is a frequent choice of those who first attempt to “roll their own”  MongoDB data protection solution.  However, this approach has numerous limitations:

  • It is not scalable – it can only process full backups, with no option for incrementals,  creating extended load and performance impact on any MongoDB server processing backups
  • It is complex – no built-in features  for backing up sharded clusters and replica sets and no means for backing up consistent snapshots across the cluster
  • It is subject to human error – both in creation of the scripts and operation of scripts and any associated manual procedures

RecoverX delivers fully automated, orchestrated, and application consistent backup and recovery for MongoDB data sets.  The RecoverX GUI makes backing up MongoDB simple, regardless of topology, and includes a REST-API for flexible extensibility. Compared to the use of custom scripting and MongoDump, our customers have seen a reduction in application downtime, as well as a decrease in human error by as much 10X.


Pitfall #2 – Scalability is constrained by a media server backup architecture

Ops Manager utilizes a MongoDB based legacy media server architecture to process and manage backup and restore operations.  Media server based backup architectures centralize the control of the backup process (the “control plane”) and moving/storing the backed-up data (the “data plane”) through a single server, resulting in silos of performance and scalability.  In order to scale backup performance (increased RPOs or faster RTOs), the only option is adding more media servers which exacerbates the problem while also increasing cost.   

Datos IO RecoverX separates the control plane and the data, eliminating media server silos of performance and scale. Data moves directly, in parallel, between MongoDB database nodes and backup storage. As a result, you will not experience limits of media server performance chokepoints.


Pitfall #3 – Native incremental backup is complex and costly

Ops Manager requires a full copy of every protected database updated with a stream of oplogs PLUS point in time snapshots in order to create point in time backups. The complexity of this workflow makes the backup process slow and necessitates a costly upfront investment in infrastructure.

Datos IO RecoverX uses true incremental forever backup after initial full sync to create point-in-time versions.   Because RecoverX does not store a full copy of every protected database + Point-in-Time snapshots for each database, it delivers simple operation, configuration, and scaling in comparison to the native MongoDB Ops Manager.


Pitfall #4 – Costly backup storage

Ops Manager uses its own optional block based deduplication of full backup snapshots only if backup snapshots are stored on block storage.  While this results in some storage efficiency, block based deduplication impacts the backup and recovery process over time as the number of blocks in use increase.  In addition, using block storage, even with some reduction from deduplication, is expensive, especially in the cloud.   

RecoverX uses industry-first, application aware semantic deduplication in combination with storing backups on economical secondary or object storage, including Amazon S3.  Additionally, backups subsequent to the full initial sync transfer only true incremental data. The result is maximum storage efficiency and cost effective backup with customers achieving up to 90% reduction in backup storage requirements.


Pitfall #5 – No support for copy data management use cases

Ops Manager only restores data to MongoDB clusters of the same original topology (shards/replica sets).  The  result is limited options to restore data to QA/Test/Dev clusters because they are typically configured with a different topology/data capacity.    

RecoverX enables data to be restored to the same or different clusters regardless of topology/configuration. As a result, customers can simplify and automate copy data management use cases including test/dev refresh and upgrade/migration in the same location, across different locations, or across public or hybrid clouds.  


Pitfall #6 – No extensibility to multiple data sources

When delivering enterprise data protection, organizations prefer a single pane of glass for protecting a variety of databases, filesystems, and enterprise data sources.  However, native MongoDB backup solutions are not extensible for data sources other than MongoDB. This results in more silos, more operational overhead, and more point-based backup solutions all of which add cost, complexity, and risk to your backup strategy.

Datos IO RecoverX supports multiple non-relational databases (e.g. MongoDB, Cassandra, Couchbase) and Big Data filesystems (e.g. Cloudera, Hortonworks) enabling customers to leverage a common solution for all their non-relational and Big Data backup and recovery requirements.

MongoDB is an excellent database used by leading organizations everywhere.  However, if you are deploying MongoDB for production applications with business critical data, you need Datos IO RecoverX, the industry’s first cloud-scale, application-centric, software-only data protection solution that delivers scalable and fully featured point-in-time backup and recovery for next-generation applications built on non-relational databases and big data filesystems.