Background: Why Scalatra Apps Get Into Thread-Pool Trouble
Scalatra apps commonly run on Jetty or Tomcat, using the Servlet API. Handlers often compose blocking database calls, cross-service HTTP requests, and transformations wrapped in Scala's Future
. It's tempting to reuse a single execution context (frequently ExecutionContext.global
) and let the container thread service the request while the Future continues. In production, this pattern interacts poorly with servlet thread limits, JDBC pool limits, and retry semantics. A short burst of slow queries can fill the servlet threads; upstreams retry, backlogs grow, and the system enters a self-inflicted stall.
Key Contributing Factors
- Blocking work on servlet threads: JDBC, file I/O, and synchronous HTTP calls tie up container threads.
- Unbounded or inappropriate execution contexts:
ExecutionContext.global
is fork-join based and not tuned for blocking. - Poorly sized pools: HikariCP/Slick pools smaller than concurrency demands, or container thread pool sized too low/high relative to DB capacity.
- Timeout mismatch: Upstream client timeouts shorter than downstream DB timeouts, leading to retries and thundering herds.
- Inadequate backpressure: Futures provide no built-in backpressure; everything tries to run, then starves.
Architecture: Where the Bottlenecks Hide
Consider a typical architecture:
- Scalatra on Jetty/Tomcat behind a reverse proxy (Nginx, Azure App Gateway, AWS ALB).
- Routes call Slick or plain JDBC through HikariCP.
- Cross-service calls via blocking clients or synchronous HTTP libraries.
- Execution contexts: default global for both CPU-bound and blocking tasks.
Under peak load, incoming connections map to a fixed number of servlet threads. If each request synchronously executes a 50–300 ms DB call and a 100–200 ms downstream HTTP call, servlet threads stay occupied. When the servlet pool saturates, new requests queue at the connector. Because Futures often attach continuations that also need threads on the same machine, local thread availability collapses. This can resemble a deadlock: nothing progresses because every thread is waiting on an external resource while holding onto a critical execution slot.
Symptoms Observed in Production
- Periodic spikes of 502/504 at the proxy, while app logs show little new output.
- JVM appears healthy (low CPU), yet response times explode to seconds or minutes.
- Thread dumps show most servlet threads parked in JDBC calls or synchronous HTTP clients.
- HikariCP reports pool exhausted or slow acquisition warnings.
- GC looks normal; heap is not the primary bottleneck.
Diagnostics: A Repeatable Investigation Playbook
The goal is to confirm starvation, identify blocking hotspots, and measure pool alignment. Use multiple signals to avoid false positives.
1) Capture Thread Dumps at Peak
jstack -l <PID> > /tmp/scalatra-tdump-$(date +%s).txt # Repeat 3x at 5s intervals to see if threads are stuck in the same frames
Look for dozens of qtp
(Jetty) or http-nio
/Catalina
threads blocked in JDBC driver methods or java.net.SocketInputStream.read
. If a majority of servlet threads are blocked simultaneously, you're starved.
2) Enumerate Active and Queued Requests
# Jetty example (JMX via jcmd) jcmd <PID> VM.system_properties | grep -i jmx # Use JMX client to read org.eclipse.jetty.server:type=threadpool metrics: # threads, idleThreads, busyThreads, queueSize
Queue growth with busyThreads == maxThreads is a hallmark of container saturation.
3) Inspect DB and HTTP Client Metrics
# HikariCP logs (enable leakDetectionThreshold) # com.zaxxer.hikari.pool.HikariPool - After connection acquisition ... # Slick / JDBC timings in logs or exported via Dropwizard or Micrometer
Long acquisition times signal DB pool contention. Combine with thread-dump frames to pinpoint hotspots.
4) Profile the JVM
# async-profiler (cpu & wall): ./profiler.sh -d 30 -e wall -f /tmp/wall.svg <PID>
If wall-time is spent in network I/O or JDBC reads, you're blocking excessively on servlet threads.
5) Examine Execution Context Usage
grep -R "ExecutionContext.global" src/main/scala # Inventory where blocking is dispatched; ensure dedicated contexts exist
Centralize execution context creation to audit sizing and isolate blocking work clearly.
Common Pitfalls That Trigger Starvation
- Blocking on the request thread: Direct JDBC or
Await.result
inside route handlers. - One-size-fits-all EC: Using
global
for both CPU-bound JSON transforms and blocking I/O. - Misaligned pool sizes: 200 servlet threads with a DB pool of 20 connections causes waves of waiting threads.
- Retry storms: Upstream gateways retry on 502/504 with zero jitter, amplifying load just as capacity disappears.
- Hidden synchronous calls: 'Async' wrappers around libraries that still block under the hood.
Step-by-Step Fixes: From Stabilization to Hardening
Apply the following in order. Stabilize first, then optimize sustainably.
1) Separate Execution Contexts for Blocking vs. CPU Work
Create a dedicated, bounded thread pool for blocking tasks and keep a small fork-join or fixed pool for CPU-bound transformations.
import scala.concurrent.{ExecutionContext, ExecutionContextExecutor} import java.util.concurrent.Executors object ECs { // CPU-bound work (JSON, small maps); keep modest size implicit val cpu: ExecutionContextExecutor = ExecutionContext.global // Blocking I/O: size to expected concurrency, not CPU cores private val blockingPool = Executors.newFixedThreadPool(64) val blocking: ExecutionContext = ExecutionContext.fromExecutor(blockingPool) }
Route handlers should explicitly dispatch to ECs.blocking
for JDBC and synchronous HTTP calls.
2) Use scala.concurrent.blocking
for I/O Regions
Hint the scheduler that a task will block, allowing compensating threads where supported.
import scala.concurrent.{Future, blocking} import ECs._ def loadUser(id: Long) = Future { blocking { // JDBC call dao.findUser(id) } }(blocking)
3) Offload Work From the Servlet Thread (Async Support)
Scalatra supports non-blocking servlet handling via asynchronous responses. Don't sit on the container thread while waiting on I/O.
import org.scalatra._ import scala.concurrent.Future import ECs._ class UserController extends ScalatraServlet with FutureSupport { protected implicit def executor = ECs.cpu get("/users/:id") { new AsyncResult { val is = { val id = params("id").toLong loadUser(id).map(u => Ok(u.toJson)) } } } }
AsyncResult
frees the servlet thread while the Future runs elsewhere. Ensure any blocking inside loadUser
uses the blocking EC.
4) Calibrate Jetty/Tomcat Thread Pool to Real DB Capacity
Thread pool size must reflect downstream concurrency, not just traffic volume. A common rule: limit servlet threads to about 2x the effective DB connection count for routes that always hit the DB, and lower if you also rely on synchronous HTTP calls.
# Jetty example (jetty.xml) <Set name="minThreads">32</Set> <Set name="maxThreads">128</Set> # HikariCP (application.conf) db.hikari.maximumPoolSize = 48 db.hikari.minimumIdle = 16
Aim for stable low queue sizes with headroom; avoid huge servlet pools that simply hold more blocked requests.
5) Stabilize Timeouts and Retries Across Tiers
Shorter client timeouts than server timeouts cause duplicate work under load. Harmonize them, add jitter, and cap retries.
# Upstream reverse proxy (illustrative) proxy_connect_timeout 2s; proxy_read_timeout 5s; # App-side HTTP client client.requestTimeout = 4s client.retries.max = 2 client.retries.backoff = exponential+jitter # DB timeouts db.hikari.connectionTimeout = 1500 db.statementTimeout = 3000
When the app is the caller, fail fast with bounded retries. When the app is the callee, ensure you don't exceed proxy timeouts during downstream waits.
6) Apply Bulkheads and Circuit Breakers
Prevent a slow dependency from consuming all threads. Partition by route or by external service.
// Pseudocode, Resilience4j-like val userDbPoolEC = ExecutionContext.fromExecutor(Executors.newFixedThreadPool(32)) val ordersDbPoolEC = ExecutionContext.fromExecutor(Executors.newFixedThreadPool(16)) def withBreaker val tp = new ThreadPoolExecutor(32, 64, 60, TimeUnit.SECONDS, queue, new ThreadPoolExecutor.AbortPolicy) val boundedEC = ExecutionContext.fromExecutor(tp)
An AbortPolicy
fails fast, signaling overload to callers instead of creating latency balloons.
Deep Dive: A Realistic Anti-Pattern and Refactor
Here's a condensed anti-pattern representative of many codebases:
class ReportController extends ScalatraServlet { get("/report/:id") { val id = params("id").toLong // BAD: blocks on servlet thread; mixes compute+IO in global EC val data = Await.result(Future { reportDao.load(id) }, 5.seconds) val details = Await.result(Future { http.get(s"/details/$id") }, 3.seconds) Ok(merge(data, details)) } }
Under load, this saturates servlet threads, waits for DB and network, and blocks twice with Await.result
. Now the refactor:
class ReportController extends ScalatraServlet with FutureSupport { import ECs._ protected implicit def executor = cpu private def loadReport(id: Long) = Future { blocking { reportDao.load(id) } }(blocking) private def loadDetails(id: Long) = Future { blocking { http.get(s"/details/$id", timeout = 2000) } }(blocking) get("/report/:id") { new AsyncResult { val is = { val id = params("id").toLong for { r <- loadReport(id) ; d <- loadDetails(id) } yield Ok(merge(r,d)) } } } }
This version frees servlet threads during waits, isolates blocking on a dedicated pool, and eliminates synchronous blocking in the route.
Sizing Strategy: Aligning Pools and Capacity
Pool sizing is less art than discipline. Start with measurable constraints, not guesswork.
- DB pool: Size to actual database concurrency budget (max active sessions without contention). Example: 48.
- Blocking EC: 1–2x the sum of expected concurrent I/O operations, capped to avoid oversubscription. Example: 64–96.
- Servlet threads: Enough to accept connections and dispatch but not so many that all block on the DB. Example: 96–128 when DB pool is 48.
- HTTP client pool: Independent from DB-oriented pools; apply its own limits and timeouts.
Observe queue sizes and latency percentiles under a realistic load test. Tune incrementally and document the rationale.
Operational Diagnostics: What to Monitor Continuously
- Jetty/Tomcat pool: busy vs. max threads, queue length.
- Blocking EC queue depth: Pending tasks indicate overload.
- HikariCP: utilization, wait time, and saturated events.
- HTTP client: in-flight requests, timeouts, and error rates.
- Request latency SLOs: p50/p90/p99 plus error rate by route.
- Retries: count and backoff distribution to prevent storms.
Pitfalls During Migration and Upgrades
Teams modernizing older Scalatra apps often introduce regressions by changing default thread behavior or HTTP clients.
- Switching JDBC drivers: New defaults for socket timeouts or autocommit can change blocking characteristics.
- Moving to new JVM versions: Different
ForkJoinPool
scheduling may mask or accentuate starvation. - Adopting non-blocking libraries partially: Mixed blocking/non-blocking code can still starve if blocking remains on servlet threads.
- Adding tracing: Synchronous exporters (spans flushed on the request thread) can exacerbate stalls. Use async exporters.
End-to-End Example: Golden Path Template
This template illustrates a stable layout: async routes, isolated pools, bounded queues, and defensive timeouts.
object Pools { import java.util.concurrent._ private val ioQ = new ArrayBlockingQueue private val ioExec = new ThreadPoolExecutor(48, 96, 60L, TimeUnit.SECONDS, ioQ, new ThreadPoolExecutor.CallerRunsPolicy) val ioEC: ExecutionContext = ExecutionContext.fromExecutor(ioExec) implicit val cpuEC: ExecutionContext = ExecutionContext.global } trait Services { import Pools._ def getUser(id: Long): Future[User] = Future { scala.concurrent.blocking { userDao.load(id) } }(ioEC) def getOrders(id: Long): Future[List[Order]] = Future { scala.concurrent.blocking { http.getOrders(id, 2000) } }(ioEC) } class Api extends ScalatraServlet with FutureSupport with Services { import Pools._ protected implicit def executor = cpuEC get("/users/:id") { new AsyncResult { val is = getUser(params("id").toLong).map(u => Ok(u.toJson)) } } get("/users/:id/orders") { new AsyncResult { val is = { val id = params("id").toLong for { u <- getUser(id) ; os <- getOrders(id) } yield Ok(render(u, os)) } } } }
Adjust pool sizes based on load testing and downstream budgets, not on CPU cores alone.
Security and Compliance Considerations
Operational fixes must respect security constraints. When adding thread pools or async handlers, ensure MDC logging and request-scoped context propagation remain intact for audit trails. Validate that timeouts and retries align with data consistency requirements; an aggressive timeout that triggers partial writes can violate invariants. Apply least-privilege principals to any new outbound HTTP clients or connection pools.
Testing Strategy: Proving You've Fixed It
Reproduce the starvation in a controlled environment and prove its elimination before promotion.
- Load-generation realism: Use recorded production traffic patterns, including slow queries and error spikes.
- Soak tests: 2–6 hour runs to surface gradual pool exhaustion.
- Chaos drills: Inject latency on DB and downstream services; verify circuit breakers and bulkheads engage.
- Regression gates: Fail builds if servlet busy threads sit at 100% for > N seconds, or if DB acquisition exceeds thresholds.
Best Practices: A Checklist for Long-Term Health
- Never block on servlet threads; use
AsyncResult
and dedicated blocking ECs. - Instrument everything: thread pools, DB pools, HTTP clients, and retries.
- Set explicit timeouts everywhere; add jittered backoff for retries.
- Keep execution contexts small and specialized; avoid sharing globally.
- Prefer non-blocking clients for high-throughput cross-service calls.
- Continuously load test representative scenarios and automate SLO checks.
- Document pool sizing assumptions and revisit quarterly or after scale changes.
Conclusion
Thread-pool starvation in Scalatra rarely announces itself clearly; it masquerades as flaky infrastructure while starving servlet threads and exhausting DB pools. The cure is architectural discipline: isolate blocking work, free servlet threads using async handlers, right-size all pools to real downstream capacity, and coordinate timeouts and retries. With instrumentation, bulkheads, and bounded queues, teams turn random 5xx storms into predictable, graceful degradation. These practices not only stabilize today's system but also create durable guardrails for future features and traffic growth.
FAQs
1. Why isn't ExecutionContext.global
good enough for my Scalatra app?
It's optimized for CPU-bound tasks and can under-provision for blocking I/O, leading to stalls. Use dedicated, bounded pools for blocking work and keep global
for lightweight transformations.
2. Do I need non-blocking DB drivers to avoid starvation?
No, but you must isolate blocking JDBC calls in a dedicated EC and size pools to capacity. Non-blocking drivers can help, but isolation and timeouts deliver most of the benefit quickly.
3. How do I propagate logging context (MDC) across async boundaries?
Wrap Futures with MDC capture/restore or use libraries that maintain context. Verify in tests that request IDs and user IDs appear consistently in logs across async segments.
4. What's the quickest stabilization step during an incident?
Lower servlet thread count to match DB capacity, cap retries at the proxy, and move blocking hotspots to a dedicated EC. This reduces queueing and allows the system to drain.
5. How can I tell if I've over-provisioned threads?
Look for rising context switches, long GC pauses from excessive stacks, and growing queue latency despite high thread counts. Fewer, well-sized pools with bounded queues typically yield lower tail latencies and better predictability.