VMware Datos IO
Enterprise Caliber Data Consistent Backup
Minimize Application Downtime
- Near-zero recovery point and recovery time objectives (RPOs and RTOs)
- Single-click and orchestrated recovery
- Granular collection-level recovery
Increase Operational Efficiency
- Flexible policy-based management
- Recovery to different topology database clusters (testing and development)
- Automated failure handling
Increase Storage Efficiency
- Industry-first semantic deduplication
- Advanced support for compacted database tables
- On-premises or public cloud use
- Compatibility with NetApp® ONTAP® 9 software
- Performance and availability benefits of NetApp arrays
The Data Protection Challenge
Rapid proliferation in social, mobile, cloud, and Internet-of-Things are driving enterprises to deploy hyper-scale, distributed, data-centric applications. Applications and uses cases such as customer analytics, e-commerce, security, surveillance, and business intelligence are driving increasing requirements for scale of data and speed of transaction processing Enterprises are rapidly adopting massively scalable and non-relational databases, such as Apache Cassandra and MongoDB, to service the data requirements of these high-volume, high-ingestion-rate, and real-time applications. Simultaneously , they are utilizing VMware based virtual infrastructures to increase their agility in deploying scalable, non-relational databases. Compared to installing and configuring database nodes on individual bare metal machines, they can be installed, configured and deployed rapidly in a VMware environment. Installation and configuration can even be orchestrated using VMware orchestration tools, resulting in “turn on a dime” deployment and expansion agility in private and hybrid clouds.
Used for business-critical applications, these databases require robust data protection requirements that are standard for enterprise data but not built-into new generation scale out database environments. These include application-consistent backup, near-zero RPO and RTO, granular, repair-free recovery, failure handling, backup storage efficiency, and software-only deployment for cloud-first environments. Because of the hyper-scale, distributed nature of these databases traditional backup and recovery products don’t support these requirements, leaving a critical data protection gap. Likewise, VMware data protection methods don’t deliver the tools or methods to meet these protection requirements for scale out databases.
The combination of VMware, Scale out databases, and Datos IO RecoverX delivers an agile environment to deploy resilient and elastic scale out database transaction processing with enterprise caliber consistent data protection.
VMware is the industry leading private cloud solution for virtualizing computing, from the data center to the cloud enabling enterprises to be more agile, responsive, and profitable. Datos IO RecoverX solution is certified with VMware Ready for Application Software™ status and is supported on VMware vSphere® 6 for production environments.
Apache Cassandra and MongoDB are modern scale-out databases that deliver high rates of transaction processing with robust resilience and easy scalability.
The Solution: Datos IO RecoverX
RecoverX is an industry-first scale-out, software-only data protection solution that is purpose-built for next-generation applications that are deployed on scale-out databases such as DataStax Enterprise and Apache Cassandra Or MongoDB.
Datos IO RecoverX is built on the foundation of Consistent Orchestrated Distributed Recovery (CODR), Datos IO’s cloud-first, scale-out data management architecture that enables customers to harness the cloud for both next-generation data protection and management. With 16 patents approved or pending, CODR uses elastic compute servicescomputeservices that can be auto-scaled with load and removes the dependency on silos of media servers compared to legacy data protection solutions.. CODR also transfers data in parallel to and from file-based and object-based secondary storage for multiple workloads, including data protection and testing and development. This powerful architecture delivers in:
- High availability: Deploying RecoverX in a cluster (3-node) configuration prevents any software process or external hardware (node) failures from compromising backup and recovery operations.
- Enhanced backup performance: With the scale-out architecture of DSE Apache Cassandra and MongoDB database, users can easily scale their database according to application growth. Similarly, the scale-out capability of RecoverX with no reliance on media servers enables customers with increased backup and recovery throughput, leading to lower RPO.
By using native application intelligence, RecoverX creates an application-consistent point-in- time backup of Apache Cassandra or MongoDB databases at user-specified intervals, a concept that we call cluster-consistent versioning. A cluster-consistent version contains all the records that have achieved user-specified consistency. As a
result, no repairs are needed when a version is restored back to the cluster, significantly reducing RTO. The backup operation is also highly parallel in its nature; RecoverX acts only as a control plane that orchestrates data movement from the data source cluster to version (secondary) storage. This approach allows RecoverX to handle large scale-out database and application workloads.
By allowing administrators to generate database backups at any user-specified time interval and at any granularity (table-level or entire database), RecoverX also simplifies operational use. Overall, versioning helps reduce data loss risk and minimizes capital and operating expenditures for an enterprise.
Fully orchestrated and granular recovery
Datos IO RecoverX provides single-click, fully orchestrated, any-point- in-time recovery. With RecoverX, customers can recover data directly restored into the same database cluster (operational recovery) or to a different cluster (testing-and- development refresh) with a different topology (number of database nodes). This flexibility reduces the operational burden of refreshing testing-and- development clusters for continuous-development DevOps environments. Further, the recovery process deals only with the logical data, making it 3 times faster than with traditional approaches. During recovery, the data is directly (with no reliance on media servers) transferred from secondary storage into target databases, resulting in a very low RTO.
Industry-first semantic deduplication
Semantic deduplication is an industry-first capability that Datos IO has developed to reduce the cost of storing backups of scale-out non-relational databases over their retention period. Most scale-out non-relational databases keep multiple copies of the primary data, called replicas. As part of versioning, Datos IO RecoverX deduplicates all the replicas of a primary dataset, thus greatly increasing storage savings without losing native formats. For example, if the database uses a replication factor of 3 (RF = 3), RecoverX saves up to 70% in secondary storage costs. By using its application awareness, RecoverX optimizes the backup of Cassandra compacted SSTables and MongoDB ops logs , resulting in significant additional secondary storage savings.
Datos IO RecoverX is a software-only data protection product that can be deployed on virtual machines.
It communicates with Apache and MongoDB Cassandra scale-out databases through a Secure Shell (SSH) connection that forms a control plane to orchestrate data movement. Data is backed up locally to an NFS target. In addition to CLIs and RESTful APIs, customers can use the RecoverX UI to manage their data protection environment.
Hosting RecoverX on VMware to protect scale-out databases hosted on VMware enables rapid and flexible deployment.
There is no need to skimp on the suggested Memory, CPU and disk capacity requirements.
Datos RecoverX is a software solution with specific machine prerequisites based on memory usage, CPU requirements and provisioned disk. In actual use it frequently doesn’t consume all the required memory or CPU resource and may not consume all the recommended disk capacity.
Memory, CPU and disk resources are virtual for VM hosts running on VMware. Customers deploying RecoverX deployed in VMware environments don’t have to dedicate these provisioned resources
- Datos IO specifies a minimum of 4 CPUs as prerequisites Provision the suggested minimum # of CPUs per VM. The vCPU resources are virtual, not dedicated. They will only be consumed during periods of backup and restore activity and may not consume 100% of the provisioned CPU during those activities.
- Use thin provisioned disk volumes – a RecoverX VM will only consume the actual percentage of the provisioned space it needs, which may be less than provisioned.
- Actual CPU consumption will frequently be less than the provisioned amount, especially after initial sync full backups and during periods when backups or restores are not in process. All subsequent backups after the initial synch “full” backup are incremental and can require less CPU resources. The sustained memory used during operations frequently will be less than amount of virtual memory provisioned.
RecoverX VM Configured Per Datos Implementation Specification
Virtual Resources provisioned for RecoverX can use a fraction of the actual resources
VMware simplifies standing up and configuring Datos IO RecoverX
RecoverX can be deployed as a cluster for resiliency and increased resources for backup orchestration and processing. Use Clones to simplify deploying a RecoverX clustered data protection solution. Create a VM (Centos or RHEL VM) for the 1st RecoverX node and configure it with the prerequisites (SSH configuration settings, specific Linux utilities, specific users and assigned GIDs,etc). Then create 2 clones of that image for the 3 VMs into which to install RecoverX for a 3 node cluster. The RecoverX installer, run on the 1st VM will automatically RecoverX on all 3 VMs.
Use vSwitch based vLANS to simplify networking
vSwitch vLANs make it easy to network Datos IO RecoverX and the Clustered Databases it is protecting on the same logical network. RecoverX and the clusters it protects need to be located on the same network.