Datos IO - RecoverX
Two years ago when it all started, we had one slide, lots of scribbling on napkins, a bold vision, and a passion to build innovative products for the data-centric era ahead of us. Many ridiculed us (they still do!) for tackling an emerging market instead of pursuing an established one, but we continue to remain steadfast in our vision: to help enterprises manage and recover their data at scale.

Datos IO - General Availability Tweet


The journey thus far has been a rollercoaster ride with highs and lows. However, nothing has deterred us from our path to Boötes today, the GA release of RecoverX. Just as Boötes is home to many other bright stars, including eight above the fourth magnitude, I too am fortunate to lead my team (the “Datonians”) who have been hard at work in designing for the long-haul, developing with an eye towards perfection, validating the product against real customer environments, and ensuring the industry has a clear sense of the product. Congratulations, Datonians!

Datos IO - Bootes

During the past two years, we’ve traveled from various earlier alpha versions to Andromeda, our DA release, to Boötes, the GA release of RecoverX, and:

  • We have completed over twenty proof-of-concept deployments with enterprise customers.
    Approximately half of these deployments were on-premise with the remainder in cloud-native environments. Needless to say, we’ve learned a lot along during the way, resulting in solidifying our value proposition and hardening RecoverX for enterprise customers.
  • We’ve identified our best customers. These customers came from key industry verticals, including financial services, e-commerce, security, and retail with application environments of SaaS, IoT, Analytics, and Real-Time Operational Intelligence. Their database sizes were 10 TB+, and cluster size was 9 nodes and above, and so much more. We have shared our learnings here and here.

These experiences have also helped us debunk two myths:

    1. “Database native triple replica architecture negates the need for backup.”
      Yes, next-generation applications are deployed on scale-out, replica-based database architecture. But while replication provides availability for always-on needs, it is also this replica based architecture that makes them extremely vulnerable to logical errors as errors will propagate to every node.
    2. “Native tools or traditional products are good enough.”
      Yes, engineers who build database products are extremely smart and they knew that any database product without any change data capture tools (via streaming or node-level snapshots) is an incomplete product. So, while these native tools do exist, they are only checkbox solutions. Native tools fall far short of enterprise needs for operational ease of use, multi-platform for enterprise-wide scale, ability to handle resiliency of infrastructure failures, space efficiency without compromising native formats of the source data (my json file should remain in json format even on secondary storage!), topology and configuration metadata, and finally enabling advanced use cases of test/dev and live migrations/upgrade.

Our most encouraging learning to date has been that with RecoverX, there is no rip and replace for customers using existing data protection infrastructure. Customers using traditional backup and recovery providers (backup software, backup appliances, and others) serve customer needs for on-premise applications. However, they fall short of next-generation data protection requirements for 3rd Platform applications that are web-scale and cloud-native in nature.

In addition to these learnings from the market and customers, some other things that I wanted to share from our journey thus far:

    1. “Software truly is eating the world”
      We are staunch believers of this and the cloud era only proves this even further. To address flexible deployments across on-premise, PaaS or IaaS environments (Amazon AWS, Google GCP, and Microsoft Azure and others) RecoverX is a software-only solution that supports multiple secondary storage options, including Network Attached Storage (NAS) and cloud-native object storage (Amazon S3 and Google Cloud Storage).
    2. Innovative products are a result of hard engineering work
      A lot of work goes on behind the scenes from some of the smartest engineers, system designers, and coders in the world.

In the midst of our GA launch today, I am proudest of Prasenjit, my co-founder and CTO, and the engineering Datonians whose efforts led to delivering CODRTM – Consistent Distributed Recovery Engine – the secret sauce of RecoverX. With CODR, Datos IO has developed a seminal data protection architecture that is no longer dependent on media servers and transfers data in parallel to and from file- and object-based secondary storage. CODR delivers space-efficient cluster-consistent backups that are available in native formats and application-ready repair-free recovery solutions – all at scale. CODR allows RecoverX to address not one but two data stores from the get-go enables highly-available (3-node) data protection infrastructure for cloud-scale needs. Hats off to Prasenjit, my co-founder and CTO, and to the entire engineering team.

With the power of CODR, RecoverX has helped customers realize recovery time periods in minutes, address multiple use cases for operational recovery and test/dev, and storage savings of up to 70% with industry-first semantic deduplication technology that innovates both for the first backup and subsequent backups. Prasenjit and I are confident that CODR will set up an industry standard that puts Datos IO years ahead of any perceived competition, existing data protection vendors, or any new vendors who claim to support next-generation databases.

But, our work is not done here and looking forward, we continue to remain focused on our vision. While we cannot speak to specific future plans, we will leave you all with some hints. We have our work cut across three key areas: A) helping customers use RecoverX for additional non-relational and cloud-native databases across their applications; B) helping customers use RecoverX for relational databases that are cloud-native, as early adopters reinforced that existing relational data stores continue to serve key enterprise applications; and C) helping customers extract value with advanced data management services such as “reified” databases that enhance operational readiness. All of this will be available in any deployment model of IaaS (Amazon AWS, Google GCP, Microsoft Azure), PaaS (Pivotal Cloud Foundry), and on-premise (VMWare, OpenStack, Bare Metal).

I welcome you all to come and check out our new website to learn about RecoverX and its industry-leading features. You will find information about protecting 3rd Platform applications with solutions for Apache Cassandra and MongoDB, our GitHub community page where we are starting with open-sourcing our distributed automation Gepetto framework, and last but not the least, welcome you to our free trial program. In closing, we always knew we are at the right place at the right time, and today we complete the trifecta with a brilliant product, RecoverX!