Understanding Distributed Query Execution in SingleStore
How Sharding and Partitioning Work
SingleStore partitions tables across nodes using a shard key. During query execution, operators like joins, aggregations, and filters are pushed to the relevant nodes. Ideally, work is balanced across partitions. But when data is unevenly distributed, some partitions take significantly longer to process.
Implications of Skewed Joins
- High CPU usage on a subset of leaf nodes
- Prolonged query execution times despite low overall load
- Timeouts on analytical queries
- Underutilization of the majority of cluster resources
Diagnostic Approach
Step 1: Use EXPLAIN and PROFILE
Run EXPLAIN
and PROFILE
to identify bottlenecks:
EXPLAIN SELECT ... FROM large_fact JOIN dim USING (key); PROFILE SELECT ... FROM large_fact JOIN dim USING (key);
Look for operators with long execution time on specific partitions.
Step 2: Analyze Shard Distribution
Run the following to inspect row counts by partition:
SELECT database_name, table_name, partition_id, row_count FROM information_schema.partition_statistics WHERE table_name = 'large_fact';
Large disparities in row_count
suggest shard-level skew.
Step 3: Investigate Skewed Keys
Use aggregation queries to detect hot keys:
SELECT key, COUNT(*) FROM large_fact GROUP BY key ORDER BY COUNT(*) DESC LIMIT 10;
High concentration on a small number of keys can overwhelm specific partitions.
Common Pitfalls and Architectural Causes
Improper Shard Key Selection
Using a high-frequency or NULL-heavy column as a shard key can cause data skew. Ensure the shard key distributes data uniformly.
Unreplicated Reference Tables
Dimension tables not set as REFERENCE
cause broadcast joins, which can overwhelm the network and certain nodes.
Joining on Non-Shard Keys
If join keys differ from shard keys, data reshuffling occurs at runtime. This undermines partition locality and introduces bottlenecks.
Fix Strategy: Step-by-Step Remediation
1. Reevaluate Shard Key Design
Use a cardinality analysis to select shard keys with even value distribution. Recreate the table with the optimized key:
CREATE TABLE optimized_fact SHARD KEY (user_id) AS SELECT * FROM large_fact;
2. Use REFERENCE Tables for Dimensions
Ensure dimension tables are declared as REFERENCE to avoid unnecessary broadcasts:
ALTER TABLE dim_table SET REFERENCE;
3. Avoid JOINS on Random or Non-Key Columns
Always align join keys with shard keys where possible to ensure data-local joins.
4. Enable Resource Pools
Set resource pool limits to prevent skewed queries from monopolizing compute:
CREATE RESOURCE POOL skew_protection WITH (MAX_CONCURRENCY = 5, MEMORY_PERCENT = 20);
Assign heavy analytical workloads to this pool using SET RESOURCE POOL
.
5. Use Histograms to Improve Optimizer Decisions
Update column histograms to help the query planner estimate cardinality more accurately:
ANALYZE TABLE large_fact UPDATE HISTOGRAM ON key_column;
Best Practices for Sustained Query Performance
- Design schemas with data distribution and locality in mind
- Use REFERENCE for small, frequently joined tables
- Continuously monitor
partition_statistics
for skew - Use
PROFILE
after every major query change - Automate histogram updates for critical tables
Conclusion
Performance issues in SingleStore often stem from overlooked data skew and poor join alignment. These problems magnify as data grows and analytical queries become more complex. Through proactive monitoring, smarter schema design, and better join practices, teams can avoid CPU bottlenecks and ensure balanced execution across the cluster. The long-term reliability of any distributed database lies not just in its engine but in how predictably it handles edge-case data distributions at scale.
FAQs
1. What causes data skew in SingleStore?
Skew usually results from poor shard key selection or high-frequency values dominating specific partitions. Always select keys with uniform cardinality.
2. Why are my queries slow even with sufficient hardware?
Performance is often limited by unbalanced partitions or join reshuffling, not raw hardware. Check execution plans for bottlenecks.
3. How can I verify if joins are data-local?
Use EXPLAIN and look for Local Join
versus Repartition
operators. Data-local joins are significantly faster.
4. When should I use REFERENCE tables?
Use REFERENCE for small dimension tables frequently joined with fact tables. This avoids broadcast overhead and improves query parallelism.
5. Can I automate histogram collection?
Yes. Use scheduled ANALYZE commands or script automation to update histograms periodically and keep optimizer decisions accurate.