Frequently Asked Questions

When moving from Apache Cassandra to Elassandra

Is Elassandra compatible with Apache Cassandra ?

Yes, Elassandra is the opensource Apache Cassandra version 3.11.last with a tightly integrated Elasticsearch engine. Cassandra SSTable files, administration tasks and tools remain the same, it IS the Apache Cassandra code with additional features that you can enable if needed.

Can I run Cassandra and Elassandra datacenters in the same cluster ?

Yes, you can run Apache Cassandra datacenters and Elassandra datacenters in the same cluster, you just need to add a dummy java class in the Cassandra classpath to avoid a ClassNotFoundException when Cassandra creates an Elasticsearch secondary index referenced in the CQL schema.

How can I migrate from Cassandra to Elassandra ?

There is several ways to migrate from Cassandra to Elassandra : You can replace the Cassandra binaries by Elassandra ones (or even switch a docker image as explained in this  Blog form Pythian) and rolling restart nodes. You can also add an Elassandra datacenter to an existing Cassandra cluster and stream tables. Finally, you can create a new Elassandra cluster and restore SSTables from your existing Cassandra cluster.

How Elassandra is supported ?

Elassandra is developed and maintained by Strapdata from the opensource code of Cassandra and Elasticsearch.  Strapdata provides various support contracts, training and consulting services to assist you in using Elassandra.

Where are Elasticsearch _source documents stored ?

To avoid data duplication and wastage  of disk space, Elassandra only stores data into Cassandra tables, Elasticsearch only manages Lucene indices and the _source document is not more stored in Elasticsearch, but fetched from the underlying Cassandra table.

What are Elassandra performances ?

On the write path, Elassandra synchronously updates in-memory Lucene  segments with the defined Elasticsearch fields.  Of course, write overhead depends on what you index (numbers, text, full text, etc...), but keep in minds that write throughput lineary scales with the number of nodes.

On the search path, there is two way to request Elasticsearch. If the partition key column is known, the search (or aggregation) request  is directed to one node hosting the targeted data (like routing in Elasticsearch). Thus, search throughput scale lineary with the number of nodes. Without the partition key, all nodes in the datacenter are queried like with a Cassandra secondary index. If your Cassandra replication factor is 2 for example, you can use an optimized search strategy to request half the number of nodes in the datacenter, and thus, increase the search throughput by increasing the replication factor.

Finally, comparing Elassandra performances to Cassandra or Elasticsearch ones is not really meaningful, you should rather compare the TCO of an equivalent architecture delivering the same services, same throughput and resiliency. By synchronously writing in Elasticsearch indices without duplicating the data, Elassandra drastically reduce the total volume of disk and network IOs compared to more sophisticated architectures.

How Elassandra eliminates the need for the Elasticsearch master node ?

All Elassandra nodes are Elasticsearch data, primary and master nodes. Elasticsearch mapping updates are managed through a PAXOS transaction to avoid concurrent mapping updates. Consequently, Elassandra has no Single Point Of Failure and no Single Point of Write, it's a Multi-Master Search Engine, see here why it's easier to operate in the cloud.

Does Elassandra work with Kibana, Logstash, Beat ....

Yes, by keeping the Elasticsearch REST API unchanged, Elassandra works as Elasticsearch for Kibana, Logstash, Beat, Fluentd, Fluentbit and many other tools. However, you cannot use the Elasticsearch x-pack features because it's proprietary code.

Can I execute Elasticsearch requests through the the CQL driver ?

Yes, you can run Elasticsearch queries through your favorite CQL driver. It supports search and aggregation queries as described in the Elassandra documentation. Search results are returned as Cassandra rows, allowing to use the same Data Access Objects for both Cassandra and Elasticsearch queries.

How does Elassandra support the Elasticsearch dynamic mapping ?

The Elasticsearch dynamic mapping is a great feature allowing to update the mapping when a new field is detected in an ingested document. Elassandra automatically translate the Elasticsearch mapping to update the underlying CQL schema. Elassandra batches Cassandra DDL statements to reduce the number of broadcasted schema mutations, and validate all changes before applying it. Thus, Elassandra supports logs ingestion as Elasticsearch.

Can I backup Elasticsearch indices ?

Yes, like with Cassandra SSTables, you can snapshot Elasticsearch indices on disk (Lucene files) when snapshotting a Cassandra table.  Like with Cassandra snapshots, you can then restore these files on a node having the same Cassandra token ranges. Otherwise, you can only snapshot Cassandra SSTables, and Elasticsearch indices will automatically be rebuilt when restoring SSTables (The Elasticsearch mapping is stored in the Cassandra snapshots). 

Can I use Elasticsearch Ingest processor ?

Yes, through its REST API, Elassandra supports Elasticsearch ingest processors allowing to transform the original document before writing into the underlying Cassandra table.