Understanding Equinix Metal Provisioning Flow

How Provisioning Works

When a server is provisioned via API or the web portal:

  • The device boots via iPXE and fetches the provisioning OS
  • User data and metadata are fetched from Metal’s metadata service
  • Network is configured based on assigned IP blocks
  • Custom automation (e.g., cloud-init, Terraform) runs post-boot

Key Dependencies

  • Metadata service: Critical for IP config and user scripts
  • DHCP/iPXE boot service: Determines initial network path
  • BMC/Out-of-band interfaces: For hard reboots and remote access

Common Provisioning Failures

1. PXE/iPXE Boot Timeout

The server fails to boot into the installer OS, logs show:

PXE-E55: Proxy DHCP Service did not reply
PXE-M0F: Exiting Intel Boot Agent

This can result from conflicting VLAN tags, missing DHCP leases, or misconfigured port bindings.

2. Metadata Fetch Failure

Device boots but fails to apply user data:

curl: (7) Failed to connect to 169.254.169.254 port 80: Connection timed out

Usually caused by a lack of routing to the metadata IP due to custom IP config overriding default routes.

3. Terraform Provisioning Hangs Indefinitely

Terraform apply gets stuck in provisioning phase due to delayed callback from the provisioning OS. May eventually fail with:

Error: Timeout waiting for device to become active

Root Cause Analysis

1. Incorrect VLAN or Bonding Settings

Equinix Metal supports both Layer 2 and Layer 3 networking. Mismatches between the project’s network mode and provisioned device type can result in failed NIC bring-up. Check NIC logs:

dmesg | grep eth

2. IP Conflict with Custom Subnets

Custom IPs assigned during provisioning (e.g., via cloud-init) may clash with Equinix Metal’s reserved IP ranges or break metadata routing.

Validate IPs with:

ip addr
ip route

3. Metadata Service Not Reachable

Default route must go through the management interface to reach 169.254.169.254. Misconfigured routes block cloud-init from retrieving userdata.

Resolution and Workarounds

1. Use Safe Boot Options

When creating a device, specify OS images known to boot reliably with automation (e.g., Ubuntu 22.04, Flatcar). Avoid deprecated OSes or custom images unless tested.

2. Validate Metadata Reachability Post-Boot

curl http://169.254.169.254/metadata
ip route | grep default

If unreachable, check if a custom netplan or cloud-init script overrode the default route.

3. Use Metadata-Only IP Assignment

Instead of statically assigning IPs, let Equinix Metal inject IP configuration via metadata and use cloud-init to consume:

#cloud-config
network: {config: disabled}

4. Retry with Clean Device State

Devices stuck in failed provisioning may be unrecoverable without reprovisioning. Use:

metal device delete --id <device_id>

Then reapply from scratch with validated configuration.

Best Practices

1. Prefer Layer 3 Networking for Automation

Layer 3 simplifies automation and avoids port/VLAN binding errors. Use Layer 2 only for specialized workloads (e.g., appliances, private interconnect).

2. Enable Metadata Validation in CI/CD

Before applying cloud-init or Terraform, validate metadata with mock metadata server locally or through test harnesses.

3. Use Equinix Metal Events API for Observability

Subscribe to webhook notifications for state changes (e.g., provisioning complete, PXE boot failure) to reduce manual triage.

4. Isolate Custom Networking Scripts

If using your own DHCP or netplan templates, apply them post-boot, not during iPXE phase, to avoid blocking provisioning network routes.

Conclusion

Equinix Metal offers automation-friendly bare metal infrastructure, but provisioning issues often stem from low-level networking assumptions or misconfigured IP behavior. To maintain high provisioning success rates, enforce metadata-first networking, monitor device state transitions, and avoid overwriting system routes too early. These patterns lead to reproducible and scalable infrastructure across metros and regions.

FAQs

1. Why does my server fail to PXE boot on Equinix Metal?

Common causes include missing DHCP responses due to VLAN misconfiguration or port bonding mismatches. Validate VLAN tagging per project/network type.

2. Can I use static IPs during provisioning?

It’s possible but risky. Static IPs can override metadata routes and block the instance from fetching cloud-init scripts. Use metadata-driven IP assignment when possible.

3. What causes Terraform to hang at 'provisioning'?

The Equinix Metal API waits for a successful boot and metadata retrieval. Failures in networking or disk mounting can stall this handshake.

4. How do I ensure metadata is applied reliably?

Validate that the default route remains intact and that no early-stage netplan/cloud-init script overwrites the metadata interface.

5. What OSes are best supported on Equinix Metal?

Ubuntu LTS, Flatcar, and CentOS Stream are best supported with official automation hooks. Custom OSes require validated provisioning scripts.