Understanding Compaction Backlog in ScyllaDB
What Is Compaction?
Compaction is the process of merging SSTables on disk to reduce read amplification and reclaim disk space. It also removes tombstones and duplicate entries. ScyllaDB, being a write-optimized LSM-tree database, relies heavily on regular compactions to maintain performance.
Symptoms of a Backlog
- Increased
pending_compactions
metric - Growing number of SSTables per table
- High read latencies despite low CPU usage
- Disk space usage keeps growing unexpectedly
Root Causes of Compaction Backlog
1. Misconfigured Compaction Strategy
Using SizeTieredCompactionStrategy (STCS) in high-write workloads without tuning can create excessive SSTables and poor merge efficiency. LeveledCompactionStrategy (LCS) may be more appropriate for read-heavy workloads but requires fine-tuned thresholds.
2. IO Saturation
If disk IOPS or throughput are insufficient, compaction jobs are throttled, allowing backlog to accumulate. This is common with slower disks or noisy neighbors in shared environments.
3. CPU/Memory Contention
ScyllaDB’s compaction is multi-threaded but cooperative. Under high query or streaming load, the system may deprioritize compaction to preserve query latencies, worsening backlog over time.
4. Tombstone Overhead
Excessive deletes (especially wide rows with many tombstones) increase compaction cost and slow merging, particularly when GC grace period is high and tombstones can’t yet be dropped.
Diagnostic Techniques
1. Monitor Prometheus Metrics
Key metrics to observe:
scylla_compaction_manager_pending_tasks
scylla_storage_proxy_coordinator_read_latency
scylla_sstable_per_level
(LCS only)
2. Use nodetool compactionstats
$ nodetool compactionstats
Shows ongoing and pending compactions. If pending tasks remain high and don’t reduce over time, backlog exists.
3. Analyze Disk Usage
Check if disk space usage increases disproportionate to data volume. Use du -sh /var/lib/scylla/data
and correlate with SSTable counts.
4. Profile SSTables
Use sstablemetadata
to check TTLs, tombstones, and row sizes. Large numbers of tombstones indicate future compaction pressure.
Step-by-Step Fixes
1. Tune Compaction Parameters
For STCS:
compaction = { 'class': 'SizeTieredCompactionStrategy', 'min_threshold': 2, 'max_threshold': 8 }
For LCS:
compaction = { 'class': 'LeveledCompactionStrategy', 'sstable_size_in_mb': 160 }
2. Increase IO Capacity
Use faster NVMe SSDs or increase dedicated IOPS in cloud environments. Monitor iostat
and vmstat
for disk bottlenecks.
3. Isolate Background Tasks
Use ScyllaDB’s scheduler_group
configuration to isolate compaction threads from query threads.
4. Lower GC Grace Period (With Caution)
ALTER TABLE users WITH gc_grace_seconds = 3600;
This reduces the duration tombstones are retained, speeding up their purge in compaction. Do not do this in multi-DC setups unless fully synced.
5. Force Manual Compaction
Use sparingly:
$ nodetool compact keyspace table
Use during maintenance windows to flush tombstones or reduce sstable count temporarily.
Best Practices
- Benchmark compaction strategy per table using realistic workload simulations
- Monitor compaction metrics continuously (Prometheus + Grafana)
- Avoid wide partitions that accumulate tombstones excessively
- Enable row-level TTL where applicable instead of explicit deletes
- Regularly review schema for unused fields or bloated data models
Conclusion
Pending compaction backlog is a silent but severe issue in ScyllaDB clusters, especially at scale. By understanding the interplay between storage engine mechanics, compaction strategies, and infrastructure bottlenecks, teams can proactively maintain database health. Proactive tuning, architectural decisions, and continuous observability are essential to keep ScyllaDB performant under evolving data loads.
FAQs
1. Is SizeTieredCompactionStrategy bad for all workloads?
No, STCS works well for write-heavy workloads with short-lived data. But without tuning, it causes excessive SSTables in long-lived datasets.
2. Can I safely lower gc_grace_seconds?
Yes, but only if you're confident deleted data is replicated and you're not relying on hinted handoff. In multi-DC setups, exercise extreme caution.
3. What are signs I should change to LeveledCompactionStrategy?
If your workload is read-heavy with low write volume and you're seeing high read latencies due to SSTable scanning, LCS is likely a better fit.
4. Does adding more nodes help with compaction backlog?
Only if the existing nodes are IO/CPU-bound. Otherwise, the problem is architectural and must be addressed via tuning.
5. How can I test compaction performance before production?
Use the cassandra-stress
tool or ScyllaBench to simulate workload and observe compaction behavior under load in a staging environment.