Solving Pending Compaction Backlog in ScyllaDB

Details: Category: Databases; By Mindful Chase; 20.Apr; Hits: 327

ScyllaDB, a high-performance NoSQL database compatible with Apache Cassandra, is known for its low latency and high throughput. However, in large-scale deployments—especially in multi-DC setups or streaming-heavy workloads—teams often encounter the "pending compaction backlog" problem. This manifests as increased read latencies, bloated SSTables, and eventually degraded node performance. Left unchecked, it can lead to cascading failure modes in production systems. This article breaks down the architectural causes, performance implications, and step-by-step troubleshooting techniques to detect, address, and prevent compaction backlog issues in ScyllaDB clusters.

Mindful Chase

Writing Code, Writing Stories

tbd

Experience

tbd

More to Explore

Understanding Compaction Backlog in ScyllaDB

What Is Compaction?

Compaction is the process of merging SSTables on disk to reduce read amplification and reclaim disk space. It also removes tombstones and duplicate entries. ScyllaDB, being a write-optimized LSM-tree database, relies heavily on regular compactions to maintain performance.

Symptoms of a Backlog

Increased pending_compactions metric
Growing number of SSTables per table
High read latencies despite low CPU usage
Disk space usage keeps growing unexpectedly

Root Causes of Compaction Backlog

1. Misconfigured Compaction Strategy

Using SizeTieredCompactionStrategy (STCS) in high-write workloads without tuning can create excessive SSTables and poor merge efficiency. LeveledCompactionStrategy (LCS) may be more appropriate for read-heavy workloads but requires fine-tuned thresholds.

2. IO Saturation

If disk IOPS or throughput are insufficient, compaction jobs are throttled, allowing backlog to accumulate. This is common with slower disks or noisy neighbors in shared environments.

3. CPU/Memory Contention

ScyllaDB’s compaction is multi-threaded but cooperative. Under high query or streaming load, the system may deprioritize compaction to preserve query latencies, worsening backlog over time.

4. Tombstone Overhead

Excessive deletes (especially wide rows with many tombstones) increase compaction cost and slow merging, particularly when GC grace period is high and tombstones can’t yet be dropped.

Diagnostic Techniques

1. Monitor Prometheus Metrics

Key metrics to observe:

scylla_compaction_manager_pending_tasks
scylla_storage_proxy_coordinator_read_latency
scylla_sstable_per_level (LCS only)

2. Use `nodetool compactionstats`

$ nodetool compactionstats

Shows ongoing and pending compactions. If pending tasks remain high and don’t reduce over time, backlog exists.

3. Analyze Disk Usage

Check if disk space usage increases disproportionate to data volume. Use du -sh /var/lib/scylla/data and correlate with SSTable counts.

4. Profile SSTables

Use sstablemetadata to check TTLs, tombstones, and row sizes. Large numbers of tombstones indicate future compaction pressure.

Step-by-Step Fixes

1. Tune Compaction Parameters

For STCS:

compaction = {
  'class': 'SizeTieredCompactionStrategy',
  'min_threshold': 2,
  'max_threshold': 8
}

For LCS:

compaction = {
  'class': 'LeveledCompactionStrategy',
  'sstable_size_in_mb': 160
}

2. Increase IO Capacity

Use faster NVMe SSDs or increase dedicated IOPS in cloud environments. Monitor iostat and vmstat for disk bottlenecks.

3. Isolate Background Tasks

Use ScyllaDB’s scheduler_group configuration to isolate compaction threads from query threads.

4. Lower GC Grace Period (With Caution)

ALTER TABLE users WITH gc_grace_seconds = 3600;

This reduces the duration tombstones are retained, speeding up their purge in compaction. Do not do this in multi-DC setups unless fully synced.

5. Force Manual Compaction

Use sparingly:

$ nodetool compact keyspace table

Use during maintenance windows to flush tombstones or reduce sstable count temporarily.

Best Practices

Benchmark compaction strategy per table using realistic workload simulations
Monitor compaction metrics continuously (Prometheus + Grafana)
Avoid wide partitions that accumulate tombstones excessively
Enable row-level TTL where applicable instead of explicit deletes
Regularly review schema for unused fields or bloated data models

Conclusion

Pending compaction backlog is a silent but severe issue in ScyllaDB clusters, especially at scale. By understanding the interplay between storage engine mechanics, compaction strategies, and infrastructure bottlenecks, teams can proactively maintain database health. Proactive tuning, architectural decisions, and continuous observability are essential to keep ScyllaDB performant under evolving data loads.

FAQs

1. Is SizeTieredCompactionStrategy bad for all workloads?

No, STCS works well for write-heavy workloads with short-lived data. But without tuning, it causes excessive SSTables in long-lived datasets.

2. Can I safely lower gc_grace_seconds?

Yes, but only if you're confident deleted data is replicated and you're not relying on hinted handoff. In multi-DC setups, exercise extreme caution.

3. What are signs I should change to LeveledCompactionStrategy?

If your workload is read-heavy with low write volume and you're seeing high read latencies due to SSTable scanning, LCS is likely a better fit.

4. Does adding more nodes help with compaction backlog?

Only if the existing nodes are IO/CPU-bound. Otherwise, the problem is architectural and must be addressed via tuning.

5. How can I test compaction performance before production?

Use the cassandra-stress tool or ScyllaBench to simulate workload and observe compaction behavior under load in a staging environment.

Contact Us