Understanding Compaction Backlog in ScyllaDB

What Is Compaction?

Compaction is the process of merging SSTables on disk to reduce read amplification and reclaim disk space. It also removes tombstones and duplicate entries. ScyllaDB, being a write-optimized LSM-tree database, relies heavily on regular compactions to maintain performance.

Symptoms of a Backlog

  • Increased pending_compactions metric
  • Growing number of SSTables per table
  • High read latencies despite low CPU usage
  • Disk space usage keeps growing unexpectedly

Root Causes of Compaction Backlog

1. Misconfigured Compaction Strategy

Using SizeTieredCompactionStrategy (STCS) in high-write workloads without tuning can create excessive SSTables and poor merge efficiency. LeveledCompactionStrategy (LCS) may be more appropriate for read-heavy workloads but requires fine-tuned thresholds.

2. IO Saturation

If disk IOPS or throughput are insufficient, compaction jobs are throttled, allowing backlog to accumulate. This is common with slower disks or noisy neighbors in shared environments.

3. CPU/Memory Contention

ScyllaDB’s compaction is multi-threaded but cooperative. Under high query or streaming load, the system may deprioritize compaction to preserve query latencies, worsening backlog over time.

4. Tombstone Overhead

Excessive deletes (especially wide rows with many tombstones) increase compaction cost and slow merging, particularly when GC grace period is high and tombstones can’t yet be dropped.

Diagnostic Techniques

1. Monitor Prometheus Metrics

Key metrics to observe:

  • scylla_compaction_manager_pending_tasks
  • scylla_storage_proxy_coordinator_read_latency
  • scylla_sstable_per_level (LCS only)

2. Use nodetool compactionstats

$ nodetool compactionstats

Shows ongoing and pending compactions. If pending tasks remain high and don’t reduce over time, backlog exists.

3. Analyze Disk Usage

Check if disk space usage increases disproportionate to data volume. Use du -sh /var/lib/scylla/data and correlate with SSTable counts.

4. Profile SSTables

Use sstablemetadata to check TTLs, tombstones, and row sizes. Large numbers of tombstones indicate future compaction pressure.

Step-by-Step Fixes

1. Tune Compaction Parameters

For STCS:

compaction = {
  'class': 'SizeTieredCompactionStrategy',
  'min_threshold': 2,
  'max_threshold': 8
}

For LCS:

compaction = {
  'class': 'LeveledCompactionStrategy',
  'sstable_size_in_mb': 160
}

2. Increase IO Capacity

Use faster NVMe SSDs or increase dedicated IOPS in cloud environments. Monitor iostat and vmstat for disk bottlenecks.

3. Isolate Background Tasks

Use ScyllaDB’s scheduler_group configuration to isolate compaction threads from query threads.

4. Lower GC Grace Period (With Caution)

ALTER TABLE users WITH gc_grace_seconds = 3600;

This reduces the duration tombstones are retained, speeding up their purge in compaction. Do not do this in multi-DC setups unless fully synced.

5. Force Manual Compaction

Use sparingly:

$ nodetool compact keyspace table

Use during maintenance windows to flush tombstones or reduce sstable count temporarily.

Best Practices

  • Benchmark compaction strategy per table using realistic workload simulations
  • Monitor compaction metrics continuously (Prometheus + Grafana)
  • Avoid wide partitions that accumulate tombstones excessively
  • Enable row-level TTL where applicable instead of explicit deletes
  • Regularly review schema for unused fields or bloated data models

Conclusion

Pending compaction backlog is a silent but severe issue in ScyllaDB clusters, especially at scale. By understanding the interplay between storage engine mechanics, compaction strategies, and infrastructure bottlenecks, teams can proactively maintain database health. Proactive tuning, architectural decisions, and continuous observability are essential to keep ScyllaDB performant under evolving data loads.

FAQs

1. Is SizeTieredCompactionStrategy bad for all workloads?

No, STCS works well for write-heavy workloads with short-lived data. But without tuning, it causes excessive SSTables in long-lived datasets.

2. Can I safely lower gc_grace_seconds?

Yes, but only if you're confident deleted data is replicated and you're not relying on hinted handoff. In multi-DC setups, exercise extreme caution.

3. What are signs I should change to LeveledCompactionStrategy?

If your workload is read-heavy with low write volume and you're seeing high read latencies due to SSTable scanning, LCS is likely a better fit.

4. Does adding more nodes help with compaction backlog?

Only if the existing nodes are IO/CPU-bound. Otherwise, the problem is architectural and must be addressed via tuning.

5. How can I test compaction performance before production?

Use the cassandra-stress tool or ScyllaBench to simulate workload and observe compaction behavior under load in a staging environment.