Understanding ScyllaDB's Internal Architecture
Shard-per-Core Model
ScyllaDB uses a shared-nothing model where each CPU core owns a subset of data and operates independently via the Seastar framework. While this maximizes throughput, it can cause subtle issues such as shard overloading if queries aren't evenly distributed.
Coordinator Nodes and Load Distribution
Unlike traditional databases, in ScyllaDB every node can act as a coordinator for queries. A poorly tuned client or misconfigured driver may pin requests to one node, leading to hotspots and degraded performance.
Root Causes of Common Issues
High Latency on Specific Nodes
Latency spikes are often caused by imbalanced traffic, thread starvation, or overloaded shards. Node-specific issues usually point to either hardware irregularities or disproportionate request handling.
Large Partitions and Tombstone Overhead
ScyllaDB performs poorly with large partitions due to memory usage spikes and GC pressure. Tombstones from frequent deletes also accumulate and increase read amplification.
SELECT key, value FROM my_table WHERE id = ?; -- Avoid hot partitions by designing better primary keys
Compaction and IO Bottlenecks
Background compactions can saturate disk I/O, slowing down reads and writes. Monitor the compaction manager and consider using scylla_adm compact
to trigger or throttle compactions.
Diagnostic Workflow
Step 1: Analyze Prometheus Metrics
Use Scylla Monitoring Stack to evaluate shard-level metrics like reactor_utilization
, query_latency
, and cache_hit_ratio
. Identify which cores or nodes are misbehaving.
shard_query_latency{instance="10.0.0.5:9180"}
Step 2: Investigate System Logs
Review Scylla logs in /var/log/scylla
for messages about dropped mutations, timeouts, or compaction stalls. Look for warnings about tombstones or large partitions.
grep WARN /var/log/scylla/scylla.log
Step 3: Validate Client-Side Load Balancing
Check that the Scylla driver is configured for token-aware routing and round-robin load balancing to avoid overloading coordinators.
cluster = Cluster(contact_points, load_balancing_policy=TokenAwarePolicy(RoundRobinPolicy()))
Advanced Performance Pitfalls
Imbalanced Token Distribution
Even in virtual node mode, an incorrect number of tokens or non-uniform vNode assignments can lead to data skew. Use nodetool status
to inspect token ownership.
Memory Fragmentation and Reactor Starvation
Memory fragmentation can block allocations during high-concurrency reads. Use scylla_mem_usage
and top -H
to correlate memory usage with specific shards or threads.
Step-by-Step Fixes
- Ensure clients use token-aware drivers with shard-aware connections.
- Split large partitions during data modeling to reduce read amplification.
- Use
nodetool cleanup
after decommissioning nodes to remove ghost data. - Adjust compaction throughput via
scylla.yaml
to control I/O usage. - Enable row-level cache for frequently accessed hot rows.
Best Practices
- Avoid hot partitions by distributing writes using composite keys or bucketing.
- Monitor shard-level metrics, not just node-level metrics.
- Set appropriate thresholds for tombstone warnings in
scylla.yaml
. - Upgrade regularly to benefit from ongoing performance fixes and scheduler improvements.
- Run
nodetool repair
on a consistent cadence to ensure replica integrity.
Conclusion
ScyllaDB offers exceptional throughput but demands deep awareness of its architectural behavior. Issues that appear as simple latency problems often stem from coordination imbalance, partition design flaws, or background I/O pressure. By leveraging shard-aware diagnostics and enforcing best practices, senior teams can tune ScyllaDB to perform reliably at massive scale.
FAQs
1. Why does only one ScyllaDB node show high CPU usage?
This is often due to improper client load balancing or token distribution, causing that node to act disproportionately as a coordinator.
2. How do I know if I have a large partition problem?
Enable large_partition_warning_threshold
and monitor logs for large partition warnings. Look for high GC activity or slow queries on that partition.
3. What is the best way to safely delete rows in ScyllaDB?
Batch deletes can create tombstone floods. Use TTLs where possible and schedule repairs to mitigate tombstone build-up.
4. Can I throttle compactions in real time?
Yes, use nodetool compactionstats
and scylla_adm
to pause or throttle compactions based on system load.
5. What causes timeouts under light load?
Shard starvation, disk I/O contention, or network queue delays can cause timeouts even with low client-side TPS. Always correlate timeouts with reactor_utilization
metrics.