Understanding ArangoDB's Query Engine
AQL and Execution Plans
ArangoDB’s AQL (Arango Query Language) compiles into an execution plan consisting of operations like index scans, filters, joins, and traversals. Traversals (especially with GRAPH
functions or FOR v, e, p IN ...
) can be memory-intensive and lead to performance cliffs if not carefully bounded.
In-Memory Execution
Unlike databases that stream large result sets incrementally, AQL execution is largely in-memory. This makes memory planning critical, especially for queries returning thousands or millions of documents across collections.
Symptoms of Query and Memory Issues
- Queries that run fine on small datasets begin to time out or crash on large data
- High RAM usage on coordinator or DB servers
- Query execution plans include full collection scans despite indexes
- Graph traversals exceed memory limits or return incomplete paths
- Queries hang or degrade entire cluster responsiveness
Root Causes
1. Unbounded Graph Traversals
Using ANY
or deep INBOUND/OUTBOUND
traversal depth without LIMIT or FILTER leads to massive path exploration, which exhausts memory and CPU.
2. Poor Index Utilization
Filters placed after joins or within traversal subqueries may prevent index use, leading to full scans.
3. Non-Selective Joins or Filters
Using COLLECT
or FILTER
on large result sets after joins or traversal inflates intermediate result size and memory usage.
4. Hot Coordinators in Cluster Mode
In clustered deployments, coordinators process and buffer results. Large AQL queries or high concurrency on one coordinator leads to pressure and timeouts.
5. Absence of Query Timeouts or Limits
Long-running analytics or exploratory queries without timeouts run unchecked, impacting cluster availability.
Diagnostics and Tools
1. Use EXPLAIN
to Inspect Execution Plans
db._explain("FOR doc IN collection FILTER doc.x == 'val' RETURN doc")
Reveals index usage, sort operations, and traversal strategies.
2. Monitor Memory via ArangoMetrics
Use Prometheus or ArangoDB’s internal metrics endpoint to observe memory usage by query, collection, and coordinator node.
3. Enable Query Profiling
PROFILE
keyword in AQL returns execution time and row counts per step:
db._query("PROFILE FOR x IN myCollection FILTER x.status == 'open' RETURN x")
4. Identify Slow Queries
Check _system/_admin/metrics
or db._currentQueries()
to identify long-running or queued queries.
5. Audit Graph Design with arangosh
Ensure edge definitions, directions, and indexing on edge collections are optimal for traversal patterns.
Step-by-Step Fix Strategy
1. Bound All Traversals
FOR v, e, p IN 1..3 OUTBOUND @start GRAPH 'MyGraph' FILTER v.type == 'device' LIMIT 100 RETURN p
Always specify min/max depth and LIMIT when using traversals to constrain resource usage.
2. Refactor Filters to Enable Index Use
Move filters before joins and ensure they reference indexed attributes directly. Avoid nested FILTERs that the optimizer can’t hoist.
3. Use OPTIONS { strategy: 'depthfirst' }
for Large Graphs
Switching from breadth-first to depth-first traversal reduces memory usage when exploring long, sparse paths.
4. Distribute Query Load Across Coordinators
Balance client traffic and AQL queries across coordinators using external load balancers or client-side balancing logic.
5. Set Query Timeouts and Memory Limits
db._query(query, bindVars, { maxRuntime: 10, memoryLimit: 51200000 })
Prevents runaway queries from overwhelming the database. Tune based on use case and expected result size.
Best Practices
- Use
EXPLAIN
before deploying any complex or recursive AQL - Ensure indexes exist and are utilized via appropriate FILTER order
- Design graphs with predictable edge patterns and filtered traversals
- Limit intermediate result sets with early LIMIT or slicing
- Use ArangoSearch views for full-text or complex filter optimization
Conclusion
ArangoDB’s flexibility makes it ideal for rich, interconnected data—but complex AQL queries, especially traversals, demand careful tuning. Understanding how execution plans interact with memory and the optimizer is key to scaling effectively. By bounding queries, profiling performance, tuning filters, and balancing coordinator load, teams can avoid resource exhaustion and keep ArangoDB clusters stable and performant under real-world loads.
FAQs
1. Why does my graph traversal crash or return incomplete results?
Unbounded traversal depth or large result sets can exceed memory or runtime limits. Always use LIMIT
and depth constraints.
2. How do I know if an index is used?
Use EXPLAIN
and inspect the execution nodes. If you see IndexNode
rather than EnumerateCollectionNode
, the index is being used.
3. What’s the default memory limit for a query?
By default, ArangoDB allows large memory usage, but you can set limits per query with memoryLimit
in options (in bytes).
4. Can I cancel a running query?
Yes. Use db._killQuery(queryId)
after retrieving the ID from db._currentQueries()
.
5. Is ArangoSearch better for performance than traversal?
For search-heavy or filtered document retrieval, yes. ArangoSearch views use inverted indexes and are optimized for performance.