Understanding ScyllaDB's Internal Architecture

Shard-per-Core Model

ScyllaDB uses a shared-nothing model where each CPU core owns a subset of data and operates independently via the Seastar framework. While this maximizes throughput, it can cause subtle issues such as shard overloading if queries aren't evenly distributed.

Coordinator Nodes and Load Distribution

Unlike traditional databases, in ScyllaDB every node can act as a coordinator for queries. A poorly tuned client or misconfigured driver may pin requests to one node, leading to hotspots and degraded performance.

Root Causes of Common Issues

High Latency on Specific Nodes

Latency spikes are often caused by imbalanced traffic, thread starvation, or overloaded shards. Node-specific issues usually point to either hardware irregularities or disproportionate request handling.

Large Partitions and Tombstone Overhead

ScyllaDB performs poorly with large partitions due to memory usage spikes and GC pressure. Tombstones from frequent deletes also accumulate and increase read amplification.

SELECT key, value FROM my_table WHERE id = ?;
-- Avoid hot partitions by designing better primary keys

Compaction and IO Bottlenecks

Background compactions can saturate disk I/O, slowing down reads and writes. Monitor the compaction manager and consider using scylla_adm compact to trigger or throttle compactions.

Diagnostic Workflow

Step 1: Analyze Prometheus Metrics

Use Scylla Monitoring Stack to evaluate shard-level metrics like reactor_utilization, query_latency, and cache_hit_ratio. Identify which cores or nodes are misbehaving.

shard_query_latency{instance="10.0.0.5:9180"}

Step 2: Investigate System Logs

Review Scylla logs in /var/log/scylla for messages about dropped mutations, timeouts, or compaction stalls. Look for warnings about tombstones or large partitions.

grep WARN /var/log/scylla/scylla.log

Step 3: Validate Client-Side Load Balancing

Check that the Scylla driver is configured for token-aware routing and round-robin load balancing to avoid overloading coordinators.

cluster = Cluster(contact_points, load_balancing_policy=TokenAwarePolicy(RoundRobinPolicy()))

Advanced Performance Pitfalls

Imbalanced Token Distribution

Even in virtual node mode, an incorrect number of tokens or non-uniform vNode assignments can lead to data skew. Use nodetool status to inspect token ownership.

Memory Fragmentation and Reactor Starvation

Memory fragmentation can block allocations during high-concurrency reads. Use scylla_mem_usage and top -H to correlate memory usage with specific shards or threads.

Step-by-Step Fixes

  • Ensure clients use token-aware drivers with shard-aware connections.
  • Split large partitions during data modeling to reduce read amplification.
  • Use nodetool cleanup after decommissioning nodes to remove ghost data.
  • Adjust compaction throughput via scylla.yaml to control I/O usage.
  • Enable row-level cache for frequently accessed hot rows.

Best Practices

  • Avoid hot partitions by distributing writes using composite keys or bucketing.
  • Monitor shard-level metrics, not just node-level metrics.
  • Set appropriate thresholds for tombstone warnings in scylla.yaml.
  • Upgrade regularly to benefit from ongoing performance fixes and scheduler improvements.
  • Run nodetool repair on a consistent cadence to ensure replica integrity.

Conclusion

ScyllaDB offers exceptional throughput but demands deep awareness of its architectural behavior. Issues that appear as simple latency problems often stem from coordination imbalance, partition design flaws, or background I/O pressure. By leveraging shard-aware diagnostics and enforcing best practices, senior teams can tune ScyllaDB to perform reliably at massive scale.

FAQs

1. Why does only one ScyllaDB node show high CPU usage?

This is often due to improper client load balancing or token distribution, causing that node to act disproportionately as a coordinator.

2. How do I know if I have a large partition problem?

Enable large_partition_warning_threshold and monitor logs for large partition warnings. Look for high GC activity or slow queries on that partition.

3. What is the best way to safely delete rows in ScyllaDB?

Batch deletes can create tombstone floods. Use TTLs where possible and schedule repairs to mitigate tombstone build-up.

4. Can I throttle compactions in real time?

Yes, use nodetool compactionstats and scylla_adm to pause or throttle compactions based on system load.

5. What causes timeouts under light load?

Shard starvation, disk I/O contention, or network queue delays can cause timeouts even with low client-side TPS. Always correlate timeouts with reactor_utilization metrics.