Yii at Scale: Fixing Memory Leaks, Connection Exhaustion, and Latency in Long-Running Workers

Details: Category: Back-End Frameworks; By Mindful Chase; 03.Sep; Hits: 106

When Yii based systems move beyond web request lifecycles into long lived workers/daemons, teams often encounter baffling behavior: rising memory usage, growing numbers of open PDO connections, sporadic deadlocks, and latency spikes that look like random storms. These issues rarely surface in small apps yet become existential in enterprise queues, ETL pipelines, and event driven back ends. This article provides a senior level troubleshooting playbook for diagnosing and fixing long running Yii workers—especially yii2-queue listeners and bespoke console commands—with emphasis on root causes, architectural implications, and durable solutions that survive traffic surges and multi region rollouts.

Mindful Chase

Writing Code, Writing Stories

tbd

Experience

tbd

More to Explore

The problem in context: long lived Yii workers that bloat and stall

Typical symptoms

Operations dashboards show a steady upward memory trend for queue listeners, occasionally followed by OOM kills. Database pools reveal dozens or hundreds of idle yet open connections from the same worker hosts. Meanwhile, job latency increases over hours of uptime, then drops after a process restart. Error logs intermittently show transaction deadlocks, lock wait timeouts, or transient cache misses that cascade into thundering herds.

Why this matters at scale

In enterprise settings, one failed worker can back up Kafka topics, saturate SQS/Redis/AMQP queues, and ultimately impact SLAs for downstream services. Memory bloat reduces node density and increases cost. Leaked connections degrade database throughput and jeopardize failover. Repeated deadlocks force retry storms that amplify the original incident.

Background: how Yii behaves outside the web request lifecycle

Application lifecycle differences

In web requests, Yii’s application object and most components are created, used, and destroyed per request. In a long lived CLI worker, the same application instance and its components persist across many jobs. Anything you retain in static properties, singletons, or closures can therefore accumulate state across hours or days.

DI container and singleton scoping

Yii’s DI container supports transient definitions and singletons. In a daemon, singletons live as long as the process. Accidentally registering request scoped collaborators—for example a repository that holds references to ActiveRecord models or an external client with large buffers—as singletons produces slow leaks.

ActiveRecord, schema metadata, and PDO

ActiveRecord caches schema metadata to avoid repeated introspection. That is good for performance, yet it also means AR retains class level state. If you couple that with long running queries and many instantiated models per job, object graphs linger in memory until the garbage collector can reclaim them. Unclosed transactions and PDO statements can also keep connections pinned.

Queues, isolation, and forks

The yii2-queue extension can run jobs in the master process or isolate each job in a forked PHP process. Isolation trades a small startup cost for deterministic memory reclamation and clean component state. For truly heavy jobs, isolation is often the single most effective containment strategy.

Reproduce and measure: establish a clean baseline

Minimal controlled environment

Before changing production, reproduce the issue in a staging cluster with the same PHP version, opcache settings, and queue backend. Use production like database sizes and representative payloads. Disable nonessential noise so you can observe clear cause and effect.

Collect the right telemetry

Per process memory: RSS and heap via ps, smem, and periodic memory_get_usage(true) logging.
Open connections: database server views and lsof on the worker PID.
Per job timings: queue start/finish timestamps, downstream RPC/HTTP timings, and SQL duration.
GC activity: count of collected cycles and roots after each job.
Deadlocks/retries: error codes and retry counts per job type.

Synthetic load generator

Create a synthetic producer that enqueues representative jobs at a controlled rate. Vary payload size, batch sizes, and concurrency. The goal is to map how memory and latency respond to volume and to specific job types.

Diagnostics playbook

Step 1: visualize per job growth

Instrument the worker to print memory and connection counts at the end of every job. The simplest graph—memory delta per job—quickly reveals which job classes exhibit monotonic growth versus sawtooth patterns.

\

Step 2: watch database connections

If your pool of server side connections grows with worker uptime, you likely have unclosed transactions or orphaned PDO handles. Verify that every transaction path leaves the connection in a clean state and that you are not keeping AR models or Command objects alive across jobs.

Step 3: isolate leaks with GC and snapshots

Force a garbage collection cycle after each job and snapshot heap size. If the heap still increases monotonically, then references are retained somewhere. Inspect global singletons, static caches, and listeners that capture closures with large variables.

\

Step 4: examine queries and object churn

Profiles that show millions of created objects per job usually point to AR heavy loops loading rows one by one. Replace eager AR creation with batch/each iterators and streaming transformations to keep live object counts low.

\

Step 5: identify long tail retries and deadlocks

Deadlocks and lock wait timeouts appear sporadically and then cluster under load. Add structured logs that capture SQLSTATE codes and number of attempts. This tells you whether retries are the cause of latency spikes and whether backoff is effective.

Common root causes in the wild

AR model retention across jobs

Repositories or services that cache AR instances in static properties or singletons keep entire object graphs alive. A few hundred models per job across thousands of jobs is enough to exhaust memory.

Unreleased PDO statements and transactions

Leaving a transaction open pins server resources and prevents connection reuse. Likewise, keeping a Command object with a large result set in scope prolongs memory retention and socket lifetime.

Mis scoped singletons

Registering connectors as singletons when they should be transient leads to stale sockets, stuck TLS sessions, or giant buffered responses that never shrink. In a web request, this would not matter; in a daemon, it persists indefinitely.

Event listeners and global state that accumulate

Adding listeners to global events each time a job runs without removing them duplicates callbacks. After hours, every event does N calls, each holding references to prior jobs’ data.

Retry storms from deadlocks

Workers that catch deadlock exceptions and immediately retry without jitter synchronize across replicas. What begins as a single conflict turns into a burst of retries that keeps the hot rows hot.

Cache stampedes and stale on miss

When a frequently used key expires, dozens of concurrent jobs rebuild it simultaneously, all hammering the database. Without mutexes or soft TTLs, cache is not a protective layer; it amplifies load.

Step by step fixes

1) Contain the blast radius with process isolation

Run job isolation so each job executes in a separate PHP process. The master listener becomes a lightweight supervisor; memory and file descriptors are reclaimed by the OS after every job.

$ php yii queue/listen --isolate=1 --verbose=1 --sleep=1

For cron style execution, prefer queue/run with a bounded number of jobs per invocation to enforce periodic process renewal.

$ php yii queue/run --verbose=1

2) Repair database and transaction lifecycles

Ensure every transaction, even under exceptions, closes promptly. Close and reopen the connection at safe boundaries to guarantee fresh state. Avoid keeping AR instances with lazy relations alive after commit.

\

3) Switch to batching and streaming in hot paths

Replace find()->all() with batch() or each(). Operate on scalar projections when possible to avoid AR overhead. Defer relationship loading; fetch only what you need.

\

4) Implement deadlock safe transactions with bounded retries

Wrap transaction bodies with a helper that recognizes deadlock/serialization errors and retries with exponential backoff and jitter. Keep the maximum attempts small and emit metrics for visibility.

\

5) Prevent cache stampedes

Use getOrSet with a per key mutex so that only one worker rebuilds an expired item. Add jitter to TTLs to spread expirations over time. Consider soft TTLs with background refresh for hot keys.

\

6) Redesign pagination for high volume scans

Offset based pagination on large tables grows slower over time and exacerbates deadlocks. Prefer keyset pagination that uses a stable ordered cursor, which is trivial to implement with an indexed column like id or a timestamp.

\","id",$cursor])
    ->orderBy(["id" => SORT_ASC])
    ->limit(1000)
    ->all();
$lastId = end($rows)["id"] ?? $cursor;

7) Make state disposable

Do not register request scoped services as singletons. Avoid static caches inside repositories/services used by workers. Where you must keep a singleton, give it an explicit reset() method called after each job to release references and close clients.

\

8) Operational guardrails: health, rotation, and back pressure

Add a watchdog that exits the worker when RSS exceeds a configured ceiling or after N jobs. Rely on systemd/Supervisor/Kubernetes to restart cleanly. Implement queue back pressure by lowering concurrency when downstream errors rise.

\

Configuration hardening

Enable schema cache and tune durations

Ensure enableSchemaCache is on in both web and console configs. Use a shared cache component. Set a sensible schemaCacheDuration so metadata is stable during a deploy window but refreshes after migrations.

\

Queue listener flags and supervisors

Prefer --isolate=1 for heavy jobs. Under Supervisor or systemd, configure immediate restarts on exit code 0 to support intentional rotation. Add a short sleep to avoid hot spin if the queue is empty.

[program:yii-queue]
command=php /app/yii queue/listen --isolate=1 --sleep=1 --verbose=1
numprocs=4
autorestart=true
redirect_stderr=true
stdout_logfile=/var/log/queue.log

Graceful deployments around migrations

Pause workers before applying database migrations that alter hot tables, then resume with the new code. This prevents shape mismatches that would otherwise throw exceptions and leak partially initialized objects.

Performance optimization checklist

Use this quick list after stabilizing the worker:

Turn on opcache for CLI if startup time dominates and code is immutable on the node.
Replace AR with Query builders for large scans; hydrate DTOs instead of full models.
Use partial indexes and covering indexes for the worker’s critical paths.
Compress payloads in the queue if network dominates, but measure CPU impact.
Batch external API calls; prefer idempotent bulk endpoints.
Keep concurrency modest; parallelism amplifies deadlocks on the same hot rows.
Persist idempotency keys to avoid duplicate side effects during retries.
Emit cardinality bounded metrics per job type and error code for SLOs.

Pitfalls and anti patterns

Global state in helpers that caches the last processed job or request context.
Static registries of models for convenience debugging that never get cleared.
Retry on any exception without classification or backoff.
Offset pagination on write heavy tables, causing table scans and lock contention.
Using a single massive cache key as a cross service rendezvous point.
Running workers during schema migrations that add/drop columns used by hot queries.

Putting it all together: a resilient Yii worker skeleton

This skeleton shows key ideas: isolation friendly design, explicit cleanup, deadlock aware transactions, metrics, and bounded resource usage.

\

Long term design choices

Choose isolation by default for heavy jobs

If a job touches many rows, calls multiple services, or performs large DTO/AR graphs, treat isolation as the default. Use in process execution only for very small and frequent tasks where startup cost is the bottleneck.

Segment queue topics by contention domain

Place jobs that write to the same tables or rows into the same topic and run them with limited concurrency. This reduces cross topic lock contention and makes back pressure easier.

Design idempotence from the start

Make job handlers safe to run multiple times. Store idempotency keys with expiry, use natural keys where possible, and avoid non deterministic side effects like random coupon assignment without a stable seed.

Prefer query builders and raw SQL for bulk paths

AR is ideal for complex business rules on single aggregates but imposes object overhead on bulk processing. For the 20 percent of paths that process 80 percent of the volume, use Query and bulk DML.

Bounded caches and explicit invalidation

Favor many small keys with targeted TTLs over a few giant aggregates. Provide explicit invalidation paths triggered by writes to keep read repair simple and predictable.

Testing and verification

Load tests that mimic diurnal patterns

Generate load with the same peaks, valleys, and payload mix you see in production. Many leaks show only after specific job orderings or when a rare path is executed.

Chaos drills for downstream dependencies

Introduce fail slow and fail fast modes in downstream services and databases. Verify that backoff, idempotence, and circuit breakers behave correctly and that workers shed load gracefully.

Canary and dark launch workers

Run a small percentage of jobs through a new worker version and compare metrics side by side with the stable cohort. Promote only when memory slope and latency distributions match or improve.

Conclusion

Memory growth, connection leaks, deadlocks, and cache stampedes in long lived Yii workers are not random acts. They are predictable outcomes of process persistence, mis scoped singletons, AR heavy loops, and insufficient isolation. By measuring per job resource usage, closing transactional boundaries, adopting isolation, batching, and deadlock aware retries, you can make Yii workers boringly reliable. Wrap that with operational guardrails—bounded RSS, rotation, graceful deploys, and canaries—and your queue backed services will scale cleanly with traffic and complexity.

FAQs

1. Should I always run yii2-queue with --isolate=1?

No. Isolation is the safest default for heavy/complex jobs, but for very small CPU bound tasks the fork cost may dominate. Measure both modes under realistic load and consider a hybrid approach by topic.

2. How do I find which service or singleton is leaking?

Add a reset() method to suspect services and call it after each job while logging memory deltas. Binary search by disabling one reset at a time; the one that changes the slope is your prime suspect.

3. Why does offset pagination make deadlocks worse?

Offset pagination forces the database to scan and skip growing numbers of rows, extending lock lifetimes and amplifying contention. Keyset pagination touches only the next contiguous window and keeps locks localized.

4. Is enabling opcache for CLI a good idea?

Often yes if code changes are deployed via new containers or if you restart workers on each deploy. It reduces startup overhead per isolated job. Avoid it if you hot edit code on the node during debugging.

5. What is the simplest production safe stop/start strategy?

Use a supervisor (systemd/Supervisor/Kubernetes) with health probes that fail on high RSS or repeated errors. Exit voluntarily on those conditions and let the supervisor restart a clean process with exponential backoff.

Contact Us