Apache Cassandra Performance Tuning

0

What is Apache Cassandra?

▪ NoSQL database

▪ Schema-free

▪ Very fast ‘write’

▪ Cassandra is designed to handle big data workloads across multiple nodes with no single point of failure

▪ Cassandra addresses the problem of failures by employing a peer-to-peer distributed system across homogeneous nodes where data is distributed among all nodes in the cluster

High Level Architecture of Apache Cassandra

Performance tips – 1 (DB Modeling)

▪ Model your DB carefully

▪ Need to understand your application and access patterns

▪ DB is modelled around access patterns, which is very different from RDBMS world

▪ Use demoralization, super column family and such feature’s to overcome costly joins of RDBMS world

Performance tips – 2 – 4 (CF settings)

▪ Column family parameters impacting performance

▪ Keys_cached – Number of keys to be cached

▪ Row_cached – Number o rows, to be cached (need more memory, but can improve performance drastically)

▪ Preload_row_cache – whether to prepopulate row cache on startup

▪ Large MemTables – can improve read performance (has 4-5-settings related to memTables), including memtable_flush_writes (number of threads)

▪ Each CF is stored on a disk its own separate file, so keep related columns in the same CF, and SCF can come real handy

▪ Gc_grace_seconds – Time to wait before removing tombstones.

Performance tips – 5-6 (concurrent read writes)

▪ Concurrent_reads (By defaults this is 32) : A good value is 4 concurrent_reads per processor core. Increase this value for 
systems with fast I/O storage.

▪ Concurrent_write (By default this is 32) : If needed, increase this value for system with many core, but out of box setting work fine for most of the requirements, as write is usually very fast.

Performance tips – 7 – 9 (commitlog)

▪ Commitlog_rotation_threshold_in_mb – how large commit log can grow before new file is created

▪ Commitlog_sync – Can be Periodic or Batch. Using batch can reduce performance as it blocks until write operation is synced to disk .

▪ Separate disk for commit logs, to reduce IO contention during write.

Performance tips – 9-10 ( compression )

▪ Use compressions – Depending on the data characteristics of the table, compressing its data can result in:
- 2x-4x reduction in data size
- 25-35% performance improvement on reads
- 5-10% performance improvement on writes

▪ Concurrent_compactors – Sets the number of concurrent compaction processes allowed to run simultaneously on a node

Performance tips – 11-13 (timeouts)

▪ Range_request_timeout_in_ms – The time that the coordinator waits for sequential or index scans to complete

▪ Read_request_timeout_in_ms – The time that the coordinator waits for read operations to complete

▪ Write_request_timeout_in_ms – The time that the coordinator waits for writes to complet

Performance tips – 14-15 (RPC settings for clients)

▪ Rpc_keepalive – Enable or disable keepalive on client connections

▪ Rpc_max_threads – number of maximum requests in the RPC thread pool dictates how many concurrent requests are possible

Performance tips – Misc.

▪ Dynamic Snitching – dynamic snitch monitors read latency and, when possible, routes requests away from poorly-performing 
nodes 

▪ Bloom filter (bloom_filter_fp_chance – Desired false-positive probability for SSTable), higher settings use less memory, but will result in more disk I/O if the SSTables

▪ Keep an eye on compaction (compaction_throughput_mb_per_sec, it is used for throttling compaction overheads)

Performance tips – JVM

▪ JVM tuning is applicable

▪ Heap min and max size – set both as same

▪ Assertions – disable assertions while launchig JVM

▪ Survivor Ratio, MaxTenuringThreshold and GC Algo

Hope it worked for you !! 🙂