Understanding Cassandra's Core Architecture
Distributed, Decentralized Design
Cassandra uses a peer-to-peer architecture where all nodes are equal. Data is partitioned and replicated across nodes using consistent hashing and a tunable consistency model.
Write and Read Paths
Writes go to a commit log and memtable, then flushed to immutable SSTables. Reads involve multiple SSTables, memtables, and potential coordinator + replica lookups, depending on consistency level.
Common Cassandra Issues in Production
1. High Read Latency
Can be caused by:
- Excessive SSTables due to poor compaction
- Tombstone scanning (too many deleted rows)
- Hot partitions or skewed data distribution
- Read repair or consistency level mismatches
2. Hinted Handoff Overload
During node downtime, hints are stored and replayed upon recovery. Excessive hints can overload the cluster or delay recovery.
3. Compaction Backlog
Improper compaction strategy or disk saturation leads to thousands of SSTables, increasing read amplification and degrading performance.
4. JVM and GC Issues
GC pauses in Cassandra (especially with CMS or G1GC) can cause dropped messages, node timeouts, and failures in gossip or replication.
5. Inconsistent Data Across Replicas
Often caused by write timeouts or uncoordinated repairs. Eventually consistent model requires regular anti-entropy repair to ensure consistency.
Diagnostic Tools and Techniques
1. Nodetool Commands
nodetool status nodetool tpstats nodetool compactionstats nodetool netstats nodetool repair nodetool cfstats
These provide metrics on thread pools, compaction, hints, read/write latency, and streaming activity.
2. System Logs
Monitor system.log
and debug.log
for dropped mutations, large GC pauses, or failed repairs.
3. JVM Metrics
- Use JMX exporters or tools like Prometheus + Grafana
- Track heap usage, GC time, thread pools, and native memory consumption
4. Read Path Profiling
Enable tracing for slow queries:
CONSISTENCY QUORUM; TRACING ON; SELECT * FROM keyspace.table WHERE id = '123';
Step-by-Step Troubleshooting Guide
1. Address Read Latency
- Check
nodetool cfstats
for tombstone count and SSTable read metrics - Reduce tombstone pressure via
gc_grace_seconds
tuning and TTL cleanup - Run
nodetool repair
andnodetool cleanup
on nodes with frequent issues
2. Fix Compaction Issues
- Switch to
LeveledCompactionStrategy
for read-heavy workloads - Ensure enough disk space (at least 50% free) for compaction
- Throttle compaction with
compaction_throughput_mb_per_sec
3. Resolve Hint and Gossip Failures
- Flush and delete old hints:
nodetool flush
+nodetool clearsnapshot
- Review
hinted_handoff_enabled
andmax_hint_window_in_ms
- Reboot nodes cleanly with
drain
to avoid hint spikes
4. Manage JVM and GC Behavior
- Use G1GC for better pause time predictability in Cassandra 3+
- Tune
-Xmx
,-Xms
, andnew gen
ratios - Monitor and rotate logs to catch GC-related outages
5. Validate Data Consistency
- Schedule regular
repair
andnodetool scrub
- Use
rebuild
when re-adding nodes to a cluster - Use QUORUM or LOCAL_QUORUM for strong consistency use cases
Best Practices for Cassandra Stability
- Avoid large partitions (>100MB)
- Use TimeWindowCompactionStrategy for time-series data
- Set up proper replication (RF=3) and data locality (DC-aware)
- Automate repair with tools like Reaper or OpsCenter
- Use client-side load balancing with token awareness
Conclusion
Cassandra delivers massive scalability and high write throughput, but maintaining a healthy cluster requires constant vigilance. Read latency, GC tuning, and compaction require deep knowledge of both the JVM and Cassandra internals. By using the correct diagnostics, implementing strict compaction and repair policies, and understanding anti-patterns like wide rows and tombstone overload, you can ensure Cassandra performs reliably at scale.
FAQs
1. Why are reads slower than writes in Cassandra?
Writes are append-only and fast; reads must merge data from SSTables, memtables, and perform consistency checks, which adds latency.
2. How can I reduce tombstone-related read latency?
Set lower TTLs, clean up expired data, reduce gc_grace_seconds
, and monitor with nodetool cfstats
.
3. Should I use QUORUM or ONE for consistency?
QUORUM ensures stronger consistency but higher latency. Use ONE for availability-critical, eventually consistent operations.
4. What is the best compaction strategy for time-series data?
Use TimeWindowCompactionStrategy
to group SSTables by time, improving reads and reducing compaction overhead.
5. How do I scale Cassandra horizontally?
Add nodes, ensure token ring balance, run nodetool rebuild
if replacing nodes, and monitor replication lag during topology changes.