The Apache Cassandra database is the right choice of database if you are looking for scalability and high availability without compromising performance for your mission-critical applications. Additionally, Cassandra’s support for replicating across multiple datacenters is best-in-class, providing lower latency for users and the peace of mind of knowing that you can survive regional outages. To that end, we’ve compiled a list of the Top 10 reasons why enterprises onboarding and deploying mission-critical applications should use Cassandra (Apache Cassandra, DataStax Enterprise).
1) Requirement for fast writes: Easily deals with data velocity, data variety, and data complexity issues
Many of the challenges associated with next-generation cloud applications center around data volume and data velocity. Is Cassandra able to handle the speed of data coming into the system? The answer is “yes” based on the amount of data and cluster size. Not only does Cassandra come with this ability out of the box, but there are systems of data pipeline architectures being built around ingestion speed. And to top it off, Cassandra scales linearly, making it easy to determine the right amount of capacity based on data flow.
But, there are also two often overlooked components that Cassandra provides: 1) Data variety and 2) Data complexity. Data variety is an alternate way of saying that data coming into one database can come in different forms. An example of this would be sensor input from a heart monitor and sensor input from an IV both for the same patient. The second component, data complexity, extends the previous example. The heart monitor might report 100 metrics twice per second under normal operating circumstances and up to 125 metrics once per second while the wearer is sleeping. This means the write patterns, locations, and frequencies can vary. Cassandra handles these situations gracefully.
2) Can handle massive datasets
If there were any questions about whether or not Cassandra is capable of handling large data sets, there is no need to look any further than the companies using it. They operate at massive scale. Netflix, Hulu, Instagram, eBay, Apple, and Spotify all have Cassandra working in interesting ways as part of their offerings.
The other way you know Cassandra is up to the challenge is in use case examples. Many organizations use Cassandra for applications where data grows in an unbounded way very quickly. These include: Twitter clones, a web log analytics data warehouse, and telemetry or sensor data.
3) Homogeneous environment
Unlike some of the legacy distributed systems, Cassandra does not require outside support for synchronization. All of the required components for basic operation are built directly into Cassandra. Since Cassandra also operates in a peer-to-peer fashion, this means that there is no master-slave or sharding setup and that all nodes in the ring are equal. Additionally, there is only one machine type that an administrator needs to worry about.
4) Highly fault tolerant
Cassandra employs many mechanisms for fault tolerance. Since Cassandra is masterless, there is no single point of failure. There is also the potential for zero downtime rolling upgrades. This is because Cassandra can support the temporary loss of multiple nodes (depending on cluster size) with negligible impact to the overall performance of the cluster.
The safety net Cassandra offers extends outside of your datacenter as well. Cassandra allows you to replicate your data to other data centers and keep multiple copies in multiple locations. This helps satisfy many regulatory requirements in addition to being a part of an strong disaster recovery and business continuity plan.
5) Proven success across enterprise applications and in many use cases already
There are already many examples where Cassandra is being used effectively. Banks and other financial institutions are storing large quantities of financial data in Cassandra. Analytics companies are using Cassandra to store web analytics data. Medical companies are using Cassandra to store sensor data and other time series inputs. There are also many companies making use of Cassandra for storing IoT data.
6) Ease of administration
Cassandra is a straightforward system to administer. With Cassandra being a masterless system, all nodes in the ring are the same; a homogenous system. It’s fault tolerant and can support the temporary loss of nodes with minimal impact to production performance. This means that nodes are easy to replace and the requirement to replace downed nodes immediately isn’t as strict.
7) Custom tuning
There are a lot of knobs and levers that can be turned to get Cassandra to perform optimally for your workload and environment. You can setup Cassandra to operate in a way that is consistent with your workload. For example, if you write lots of log data and read infrequently, then there are configuration tweaks to be made for write heavy systems. If you write heavily to one data center and then do all your reading from another data center, then you can adjust the settings on a data center by data center basis. This idea of tuning isn’t just available at the Cassandra application level, you can also tune the JVM and Java settings. This includes things like GC and logging levels. Changes can even be made by the drivers at connection time to aid in the performance of your system.
8) Easy to integrate core applications
A lot of work has been done on data manipulation and parsing systems to integrate with Cassandra. For instance, the full text search engine Apache Solr has been packaged to work with Cassandra to provide full featured search capabilities to an existing Cassandra database. Apache Spark, a big data analytics engine, also has been plugged in to work on existing an Cassandra database. There are entire suites of tools that can be integrated or bolted on to Cassandra to increase its capabilities. These include things like Apache Mahout, Kafka, and Zipkin just to name a few. This is important because the more tools you have available to you, the more powerful your data becomes. You also have the ability to gain more insight about your data without having to build and maintain the application systems that were previously required.
9) Excellent Monitoring Options
Included in the system of tools referenced in #8 are monitoring packages. If you are a user of automated monitoring platforms like Datadog or Netuitive, you’ll find examples of prepackaged agents to monitor the important parts of Cassandra. You can then tack on your own additions of other metrics that are important to you. This is made possible by Cassandra taking advantage of Java MBeans and exposing them to the client. You can use these to get at much of the internal information Cassandra uses to make it’s own decisions and decide on it’s own health. Datastax also offers their own monitoring and control application called Opscenter.
10) Amazing community
One of the best things about any piece of software is having a great community of developers and experts available to you for help or guidance. There is a huge yearly database summit put on by Datastax, the primary backer and largest contributor of code to Cassandra. They also sponsor community events all over the world so you can meet and interact with other Cassandra developers around you.
For open source software to really be successful, there needs to be an ecosystem that develops around it. In the case of a database like Cassandra, there are consultancies, monitoring and troubleshooting systems, plugins, instrumentation systems, and backup systems. That is all a set a competencies that your organization no longer needs to own and can use what the greater community has already created. There are even PaaS companies who will completely take over the management of Cassandra for you leaving you just the development of your application to focus on.
Given the sizable number of organizations and people who are a part of the ecosystem and the Cassandra community as a whole, there is no shortage of articles, documentation and people willing to help. A welcoming and helpful community isn’t always a given, but in the case of Cassandra, it’s alive and thriving. This is important because software is always about people. The more of them that you can interact with that have shared your experiences, the better. It will also be easier to find solutions to your problems having a network of people who might have faced them before.
There are many reasons that Cassandra could be the right tool for your application. Knowing your systems requirements, workloads, and future will help you make the right choice. As you can see, if you do choose Cassandra, you are bound to be in good company.