Our customer is a Fortune 100 home improvement retailer with thousands of brick and mortar stores in the U.S., Canada and Mexico. They have a hyper-scale eCommerce application that is deployed natively on Google Cloud Platform (GCP), and is central to their digital transformation journey.

Below is a series of questions and answers we conducted with our customer. You will find insights as they describe their microservices-based application environment, and why they selected RecoverX for backup and recovery of their Apache Cassandra database.

Please describe your application and the type of backup challenges were you experiencing.

I’ll summarize our challenges into three broad categories. The first is Backup and Recovery for Cassandra in Google Cloud. We are in the midst of our digital transformation journey, and as our online business continues to expand, we have challenges scaling our existing relational databases. We re-architected the online application using a microservices-based cloud native application architecture. This new application is deployed in GCP, and uses underlying Cassandra (DataStax) databases to populate information for different customer-facing platforms (cart order, inventory management, etc.). Given that our core online business is based on this web-scale eCommerce application, any data loss is detrimental for our business. Accordingly, our company requires the ability to backup multiple Cassandra databases and recover from any data loss with an SLA of 1 hour.

The second category is Backup Software Elasticity. Being in the e-commerce industry, we see huge spikes in data volume during specific periods, such as Thanksgiving and Christmas. We cannot rely on a static infrastructure footprint, and instead, need to scale resources only during peak seasons. This ultimately allows us to optimize the cost of operating a data protection solution, without sacrificing high-availability.

Finally, the third category is Test Cluster Refresh Using Production Data. We have multiple production and test clusters, and we implement continuous integration and continuous development (CI/CD) methodologies. Initially, we had to spend time refreshing our test clusters because they are of different topologies. We needed a data management solution that would automate the refresh of test clusters using production data at regular intervals. So, in summary the key challenges are:

  1. Cloud-Native Backups of Cassandra (DataStax Enterprise)
  2. Data protection SLA of 1 hour and minimize application downtime due to logical data errors
  3. Automate refresh of test clusters using production data at regular intervals

Why Datos IO:

  • The only cloud-native data protection software-only solution for Apache Cassandra and DataStax Enterprise in Google Cloud
  • Application consistent backups that don’t require repairs on recovery, leading to low RTO (low application downtime)
  • 80% reduction in secondary storage costs – yes, storage costs matter at our scale!
  • High-performance to guarantee 1 hour SLA using elastic compute/storage resources

How are you solving the problem today?

Our DevOps team used database native tools from Ops Center, but we were not given enterprise level data protection features, such as required SLA of one hour, storage savings, and test/dev for our application development teams. To quantify the impact of any outage, it is estimated to be hundreds of thousands of dollars in lost business, and the associated negative impact on our brand if our site goes down. Finally, we cannot afford to have our DevOps team spend hours writing and maintaining scripts. This process does not scale for our enterprise.

What is it about Datos IO RecoverX that excites you the most?

Datos IO RecoverX is a highly scalable solution where performance scales with infrastructure resources available. Using this unique capability, we increased our compute and memory resources during peak seasons, and were able to get the required backup performance. After peak seasons ended, we scaled back resources to optimize for cost.

Additionally, because RecoverX offers fully orchestrated recovery back to any cluster with any topology, we can restore production data into test clusters with ease. For example, customers can take the entire backup dataset from a 12-node production Cassandra cluster and use RecoverX to restore it to a 3-node test cluster running natively in the cloud.

Finally and most important, the “Datos IO team listens”.  Over the last 9 months, we have given them a number of product features that we need and this team keeps delivering at a record pace, starting with Google Cloud Storage support to test/dev support for unlike topologies.