Understanding Triton Architecture

SmartOS Zones and Triton CNAPI

Triton utilizes lightweight SmartOS zones (virtual machines and containers) and a centralized CNAPI service to manage compute nodes. Zones operate with their own ZFS datasets and share the global zone kernel. Misconfigured zones or stale state in CNAPI can prevent provisioning or updates.

Networking and Metadata Service

Triton employs Elastic IPs, VLANs, and its own fabric to assign networking interfaces. The Metadata Service (MAPI) delivers instance-specific configuration and is essential for bootstrapping.

Common Symptoms

  • Zones fail to start or hang at boot
  • Docker containers fail with “unknown runtime” errors
  • Cloud-init fails due to missing metadata endpoint
  • Provisioned instances do not appear in sdc-listmachines
  • Inconsistent internal/external IP assignments

Root Causes

1. Zone Image Mismatch or Corruption

Outdated or corrupted SmartOS images cause provisioning failures or zones to enter a reboot loop. This commonly occurs after partial dataset replication or disk exhaustion.

2. CNAPI Stale Cache or Communication Errors

If CNAPI loses sync with agents on compute nodes (CNs), zones may appear orphaned or not register correctly. Node flaps or expired credentials can also desynchronize CNAPI.

3. Metadata Agent (MAPI) Failure

Missing or unreachable metadata services prevent cloud-init and startup scripts from running. Causes include zone-specific firewall rules or broken MAPI on the global zone.

4. Docker API Compatibility or Triton-Docker Misconfig

Triton supports the Docker Remote API but only when correctly configured. Missing profile associations or Triton CLI misalignment with Docker versions may result in container provisioning errors.

5. Network Fabric Inconsistencies

Misconfigured VLANs or duplicate MAC addresses on internal networks can prevent traffic routing or zone reachability, especially for NATed external endpoints.

Diagnostics and Monitoring

1. Check Zone and Global Logs

vmadm list -p
vmadm get 
logadm -e now /zones/<UUID>/root/var/log/messages

Examine boot logs and provisioning output for stuck zones or startup failures.

2. Test CNAPI and MAPI Health

curl https://cnapi.<datacenter>:port/health
curl http://169.254.169.254/metadata/

Ensures CNAPI and metadata services are operational and reachable from global and non-global zones.

3. Validate Triton Docker Profile

triton profile list
docker info | grep -i triton

Confirms whether Docker CLI is using the correct remote endpoint and authentication keys for Triton.

4. Inspect VLAN and NIC Assignments

nicadm list
vmadm get  | json nics

Identifies missing or duplicate NICs, incorrect VLAN tags, or invalid MAC addresses.

5. Review CN Heartbeats

Use sdc-cnapi logs or sdc-cnapi list to check compute node connectivity, last heartbeat, and node health status.

Step-by-Step Fix Strategy

1. Reinstall or Refresh Broken Zone Image

Use sdc-imgapi to verify and re-download corrupted images. Remove partial datasets under /zones/ before reprovisioning.

2. Restart CNAPI and Agent Services

Restart cnapi, vmapi, and napi services. Verify configuration in /opt/smartdc/ and re-authenticate CNs if disconnected.

3. Restore Metadata Agent Functionality

Restart the mdata:zoneinit and mdata-client daemons. Check firewall rules and ensure port 80 on 169.254.169.254 is accessible from zones.

4. Reconfigure Triton Docker CLI

Ensure Triton profile points to the correct account and datacenter. Run triton profile setup to regenerate Docker credentials and test with docker ps.

5. Rebuild Broken Network Configurations

Remove and recreate NICs with vmadm update. Validate fabric tags and avoid MAC address collisions.

Best Practices

  • Use version-pinned SmartOS images in production
  • Monitor CNAPI and metadata agents using external health probes
  • Automate zone health checks post-provisioning
  • Segregate Docker and non-Docker workloads using project tags
  • Rotate API keys and Triton profiles periodically

Conclusion

Joyent Triton provides powerful hybrid cloud capabilities for bare-metal and container workloads. However, its SmartOS-based design and decentralized architecture demand tight coordination across zones, CNAPI, metadata services, and Docker APIs. With structured diagnostics, node monitoring, and proactive configuration management, teams can achieve consistent performance and scalable provisioning across Triton-powered infrastructure.

FAQs

1. Why is my zone stuck in "provisioning" state?

Likely due to broken image installation, missing network interface, or CNAPI sync loss. Re-check logs and refresh the image or NIC config.

2. How do I fix Docker errors when using Triton?

Ensure Docker CLI is configured with the correct Triton profile, and verify that Docker support is enabled in your Triton account and project.

3. Why are cloud-init scripts failing in my container?

Metadata service may be unreachable. Test curl http://169.254.169.254/metadata/ and ensure zone firewall permits it.

4. Can I use Triton with Terraform or Kubernetes?

Yes, but requires external drivers or Triton-specific plugins. Validate API credentials and use the Joyent-supported Terraform provider.

5. What causes CNs to disappear from the portal?

Expired API credentials or CNAPI service crash. Re-authenticate node keys and verify service status with svcs.