Understanding the Problem

Issue Summary

A custom or third-party systemd service on Ubuntu fails to start during boot but starts successfully via systemctl start after login. This leads to failed application availability, delayed monitoring agent startups, or partial service availability in clustered environments.

Common Symptoms

  • Service logs show "dependency failed" or "network unavailable" errors
  • systemctl status shows "inactive (dead)" at boot time
  • journalctl -xe reports race conditions or "unit not found"
  • Manual start works fine post-login

Root Cause Analysis

Systemd Boot Semantics

Systemd units are managed in parallel unless explicitly serialized via dependencies. Services relying on network interfaces, mounts, or sockets can attempt to start before their requirements are fully ready—especially in fast-booting systems or cloud VMs.

Dependency Graph Problems

Incorrect or missing After=, Requires=, or Wants= directives in the service file result in systemd launching services prematurely.

[Unit]
Description=Custom Daemon
After=network.target
Requires=network.target

Race Conditions and Timeouts

Services depending on interfaces like eth0 or docker0 might launch before DHCP or virtual interfaces are ready. Cloud-init or Netplan may not complete configuration before the service starts.

Architectural Implications

System Reliability and Observability

Critical services failing at boot result in alert fatigue and brittle environments. These failures are hard to trace because post-boot state appears clean, masking underlying race conditions.

Cluster and Microservice Impact

In HA setups or service meshes, delayed service registration affects discovery and auto-healing. System readiness must be deterministic for orchestrators to function correctly.

Diagnostics and Verification

Use systemd-analyze

systemd-analyze blame
systemd-analyze critical-chain

This identifies startup delays and highlights which services blocked or failed early in the boot sequence.

Check Unit File Consistency

systemctl cat my-service.service
systemd-analyze verify /etc/systemd/system/my-service.service

Verify that all Requires and After directives align with actual target units present in the system.

Monitor Logs Early in Boot

journalctl -b -0 -u my-service.service

This reveals early boot issues that do not appear once the service is started manually.

Step-by-Step Fix

1. Harden Service Dependencies

Replace generic network.target with network-online.target if the service needs full connectivity.

[Unit]
After=network-online.target
Wants=network-online.target

2. Add Explicit Dependencies

Ensure dependent services or mounts are listed in Requires= or Wants= to avoid soft failures.

3. Use systemd service conditions

Delay service execution until conditions are met using ConditionPathExists, ExecStartPre, or scripts that poll readiness.

4. Extend Timeouts if Necessary

[Service]
TimeoutStartSec=90
ExecStartPre=/usr/local/bin/check-network.sh

Check network reachability or other required system states before launching the service.

5. Rebuild Daemon and Reload

systemctl daemon-reexec
systemctl daemon-reload
systemctl enable my-service

Best Practices

  • Use network-online.target instead of network.target for connectivity-based services
  • Always verify service unit dependencies using systemd-analyze verify
  • Use ExecStartPre to script critical prechecks
  • Do not rely on login sessions or user timers for service readiness
  • In cloud environments, ensure proper cloud-init finalization before dependent services start

Conclusion

Systemd is a powerful but strict initialization system. Misconfigured unit files or ambiguous dependencies can silently delay or prevent service startup at boot, leading to erratic behavior in production. By adopting deterministic boot sequencing and verifying system dependencies with appropriate tooling, Ubuntu administrators can eliminate this class of error and build more resilient operating system configurations.

FAQs

1. Why does my service only fail at boot but runs manually?

At boot, required dependencies like network interfaces or mounts may not be ready. Manual starts occur after these resources are available, hiding timing issues.

2. How do I delay a systemd service until networking is ready?

Use After=network-online.target and Wants=network-online.target in your unit file. Also ensure systemd-networkd-wait-online.service is enabled if applicable.

3. Can I visualize service startup timing?

Yes. Use systemd-analyze blame and systemd-analyze critical-chain to see boot-time delays and dependencies.

4. What is the role of systemd-analyze verify?

It checks for syntactic and logical errors in unit files, ensuring dependencies refer to valid targets and preventing misbehavior during boot.

5. Is network.target sufficient for networking services?

Not always. network.target signals basic networking availability, but network-online.target ensures full connectivity, including IP assignment and route readiness.