Apache Cassandra Performance Tuning
3 min readWhat is Apache Cassandra?
▪ NoSQL database ▪ Schema-free ▪ Very fast ‘write’ ▪ Cassandra is designed to handle big data workloads across multiple nodes with no single point of failure ▪ Cassandra addresses the problem of failures by employing a peer-to-peer distributed system across homogeneous nodes where data is distributed among all nodes in the cluster
High Level Architecture of Apache Cassandra
Performance tips – 1 (DB Modeling)
▪ Model your DB carefully ▪ Need to understand your application and access patterns ▪ DB is modelled around access patterns, which is very different from RDBMS world ▪ Use demoralization, super column family and such feature’s to overcome costly joins of RDBMS world
Performance tips – 2 – 4 (CF settings)
▪ Column family parameters impacting performance ▪ Keys_cached – Number of keys to be cached ▪ Row_cached – Number o rows, to be cached (need more memory, but can improve performance drastically) ▪ Preload_row_cache – whether to prepopulate row cache on startup ▪ Large MemTables – can improve read performance (has 4-5-settings related to memTables), including memtable_flush_writes (number of threads) ▪ Each CF is stored on a disk its own separate file, so keep related columns in the same CF, and SCF can come real handy ▪ Gc_grace_seconds – Time to wait before removing tombstones.
Performance tips – 5-6 (concurrent read writes)
▪ Concurrent_reads (By defaults this is 32) : A good value is 4 concurrent_reads per processor core. Increase this value for systems with fast I/O storage. ▪ Concurrent_write (By default this is 32) : If needed, increase this value for system with many core, but out of box setting work fine for most of the requirements, as write is usually very fast.
Performance tips – 7 – 9 (commitlog)
▪ Commitlog_rotation_threshold_in_mb – how large commit log can grow before new file is created ▪ Commitlog_sync – Can be Periodic or Batch. Using batch can reduce performance as it blocks until write operation is synced to disk . ▪ Separate disk for commit logs, to reduce IO contention during write.
Performance tips – 9-10 ( compression )
▪ Use compressions – Depending on the data characteristics of the table, compressing its data can result in: - 2x-4x reduction in data size - 25-35% performance improvement on reads - 5-10% performance improvement on writes ▪ Concurrent_compactors – Sets the number of concurrent compaction processes allowed to run simultaneously on a node
Performance tips – 11-13 (timeouts)
▪ Range_request_timeout_in_ms – The time that the coordinator waits for sequential or index scans to complete ▪ Read_request_timeout_in_ms – The time that the coordinator waits for read operations to complete ▪ Write_request_timeout_in_ms – The time that the coordinator waits for writes to complet
Performance tips – 14-15 (RPC settings for clients)
▪ Rpc_keepalive – Enable or disable keepalive on client connections ▪ Rpc_max_threads – number of maximum requests in the RPC thread pool dictates how many concurrent requests are possible
Performance tips – Misc.
▪ Dynamic Snitching – dynamic snitch monitors read latency and, when possible, routes requests away from poorly-performing nodes ▪ Bloom filter (bloom_filter_fp_chance – Desired false-positive probability for SSTable), higher settings use less memory, but will result in more disk I/O if the SSTables ▪ Keep an eye on compaction (compaction_throughput_mb_per_sec, it is used for throttling compaction overheads)
Performance tips – JVM
▪ JVM tuning is applicable ▪ Heap min and max size – set both as same ▪ Assertions – disable assertions while launchig JVM ▪ Survivor Ratio, MaxTenuringThreshold and GC Algo
Hope it worked for you !! 🙂