Understanding Cassandra's Core Architecture

Distributed, Decentralized Design

Cassandra uses a peer-to-peer architecture where all nodes are equal. Data is partitioned and replicated across nodes using consistent hashing and a tunable consistency model.

Write and Read Paths

Writes go to a commit log and memtable, then flushed to immutable SSTables. Reads involve multiple SSTables, memtables, and potential coordinator + replica lookups, depending on consistency level.

Common Cassandra Issues in Production

1. High Read Latency

Can be caused by:

  • Excessive SSTables due to poor compaction
  • Tombstone scanning (too many deleted rows)
  • Hot partitions or skewed data distribution
  • Read repair or consistency level mismatches

2. Hinted Handoff Overload

During node downtime, hints are stored and replayed upon recovery. Excessive hints can overload the cluster or delay recovery.

3. Compaction Backlog

Improper compaction strategy or disk saturation leads to thousands of SSTables, increasing read amplification and degrading performance.

4. JVM and GC Issues

GC pauses in Cassandra (especially with CMS or G1GC) can cause dropped messages, node timeouts, and failures in gossip or replication.

5. Inconsistent Data Across Replicas

Often caused by write timeouts or uncoordinated repairs. Eventually consistent model requires regular anti-entropy repair to ensure consistency.

Diagnostic Tools and Techniques

1. Nodetool Commands

nodetool status
nodetool tpstats
nodetool compactionstats
nodetool netstats
nodetool repair
nodetool cfstats

These provide metrics on thread pools, compaction, hints, read/write latency, and streaming activity.

2. System Logs

Monitor system.log and debug.log for dropped mutations, large GC pauses, or failed repairs.

3. JVM Metrics

  • Use JMX exporters or tools like Prometheus + Grafana
  • Track heap usage, GC time, thread pools, and native memory consumption

4. Read Path Profiling

Enable tracing for slow queries:

CONSISTENCY QUORUM;
TRACING ON;
SELECT * FROM keyspace.table WHERE id = '123';

Step-by-Step Troubleshooting Guide

1. Address Read Latency

  • Check nodetool cfstats for tombstone count and SSTable read metrics
  • Reduce tombstone pressure via gc_grace_seconds tuning and TTL cleanup
  • Run nodetool repair and nodetool cleanup on nodes with frequent issues

2. Fix Compaction Issues

  • Switch to LeveledCompactionStrategy for read-heavy workloads
  • Ensure enough disk space (at least 50% free) for compaction
  • Throttle compaction with compaction_throughput_mb_per_sec

3. Resolve Hint and Gossip Failures

  • Flush and delete old hints: nodetool flush + nodetool clearsnapshot
  • Review hinted_handoff_enabled and max_hint_window_in_ms
  • Reboot nodes cleanly with drain to avoid hint spikes

4. Manage JVM and GC Behavior

  • Use G1GC for better pause time predictability in Cassandra 3+
  • Tune -Xmx, -Xms, and new gen ratios
  • Monitor and rotate logs to catch GC-related outages

5. Validate Data Consistency

  • Schedule regular repair and nodetool scrub
  • Use rebuild when re-adding nodes to a cluster
  • Use QUORUM or LOCAL_QUORUM for strong consistency use cases

Best Practices for Cassandra Stability

  • Avoid large partitions (>100MB)
  • Use TimeWindowCompactionStrategy for time-series data
  • Set up proper replication (RF=3) and data locality (DC-aware)
  • Automate repair with tools like Reaper or OpsCenter
  • Use client-side load balancing with token awareness

Conclusion

Cassandra delivers massive scalability and high write throughput, but maintaining a healthy cluster requires constant vigilance. Read latency, GC tuning, and compaction require deep knowledge of both the JVM and Cassandra internals. By using the correct diagnostics, implementing strict compaction and repair policies, and understanding anti-patterns like wide rows and tombstone overload, you can ensure Cassandra performs reliably at scale.

FAQs

1. Why are reads slower than writes in Cassandra?

Writes are append-only and fast; reads must merge data from SSTables, memtables, and perform consistency checks, which adds latency.

2. How can I reduce tombstone-related read latency?

Set lower TTLs, clean up expired data, reduce gc_grace_seconds, and monitor with nodetool cfstats.

3. Should I use QUORUM or ONE for consistency?

QUORUM ensures stronger consistency but higher latency. Use ONE for availability-critical, eventually consistent operations.

4. What is the best compaction strategy for time-series data?

Use TimeWindowCompactionStrategy to group SSTables by time, improving reads and reducing compaction overhead.

5. How do I scale Cassandra horizontally?

Add nodes, ensure token ring balance, run nodetool rebuild if replacing nodes, and monitor replication lag during topology changes.