Understanding Triton Architecture
SmartOS Zones and Triton CNAPI
Triton utilizes lightweight SmartOS zones (virtual machines and containers) and a centralized CNAPI service to manage compute nodes. Zones operate with their own ZFS datasets and share the global zone kernel. Misconfigured zones or stale state in CNAPI can prevent provisioning or updates.
Networking and Metadata Service
Triton employs Elastic IPs, VLANs, and its own fabric to assign networking interfaces. The Metadata Service (MAPI) delivers instance-specific configuration and is essential for bootstrapping.
Common Symptoms
- Zones fail to start or hang at boot
- Docker containers fail with “unknown runtime” errors
- Cloud-init fails due to missing metadata endpoint
- Provisioned instances do not appear in
sdc-listmachines
- Inconsistent internal/external IP assignments
Root Causes
1. Zone Image Mismatch or Corruption
Outdated or corrupted SmartOS images cause provisioning failures or zones to enter a reboot loop. This commonly occurs after partial dataset replication or disk exhaustion.
2. CNAPI Stale Cache or Communication Errors
If CNAPI loses sync with agents on compute nodes (CNs), zones may appear orphaned or not register correctly. Node flaps or expired credentials can also desynchronize CNAPI.
3. Metadata Agent (MAPI) Failure
Missing or unreachable metadata services prevent cloud-init and startup scripts from running. Causes include zone-specific firewall rules or broken MAPI on the global zone.
4. Docker API Compatibility or Triton-Docker Misconfig
Triton supports the Docker Remote API but only when correctly configured. Missing profile associations or Triton CLI misalignment with Docker versions may result in container provisioning errors.
5. Network Fabric Inconsistencies
Misconfigured VLANs or duplicate MAC addresses on internal networks can prevent traffic routing or zone reachability, especially for NATed external endpoints.
Diagnostics and Monitoring
1. Check Zone and Global Logs
vmadm list -p vmadm getlogadm -e now /zones/<UUID>/root/var/log/messages
Examine boot logs and provisioning output for stuck zones or startup failures.
2. Test CNAPI and MAPI Health
curl https://cnapi.<datacenter>:port/health curl http://169.254.169.254/metadata/
Ensures CNAPI and metadata services are operational and reachable from global and non-global zones.
3. Validate Triton Docker Profile
triton profile list docker info | grep -i triton
Confirms whether Docker CLI is using the correct remote endpoint and authentication keys for Triton.
4. Inspect VLAN and NIC Assignments
nicadm list vmadm get| json nics
Identifies missing or duplicate NICs, incorrect VLAN tags, or invalid MAC addresses.
5. Review CN Heartbeats
Use sdc-cnapi
logs or sdc-cnapi list
to check compute node connectivity, last heartbeat, and node health status.
Step-by-Step Fix Strategy
1. Reinstall or Refresh Broken Zone Image
Use sdc-imgapi
to verify and re-download corrupted images. Remove partial datasets under /zones/
before reprovisioning.
2. Restart CNAPI and Agent Services
Restart cnapi
, vmapi
, and napi
services. Verify configuration in /opt/smartdc/
and re-authenticate CNs if disconnected.
3. Restore Metadata Agent Functionality
Restart the mdata:zoneinit
and mdata-client
daemons. Check firewall rules and ensure port 80 on 169.254.169.254 is accessible from zones.
4. Reconfigure Triton Docker CLI
Ensure Triton profile points to the correct account and datacenter. Run triton profile setup
to regenerate Docker credentials and test with docker ps
.
5. Rebuild Broken Network Configurations
Remove and recreate NICs with vmadm update
. Validate fabric tags and avoid MAC address collisions.
Best Practices
- Use version-pinned SmartOS images in production
- Monitor CNAPI and metadata agents using external health probes
- Automate zone health checks post-provisioning
- Segregate Docker and non-Docker workloads using project tags
- Rotate API keys and Triton profiles periodically
Conclusion
Joyent Triton provides powerful hybrid cloud capabilities for bare-metal and container workloads. However, its SmartOS-based design and decentralized architecture demand tight coordination across zones, CNAPI, metadata services, and Docker APIs. With structured diagnostics, node monitoring, and proactive configuration management, teams can achieve consistent performance and scalable provisioning across Triton-powered infrastructure.
FAQs
1. Why is my zone stuck in "provisioning" state?
Likely due to broken image installation, missing network interface, or CNAPI sync loss. Re-check logs and refresh the image or NIC config.
2. How do I fix Docker errors when using Triton?
Ensure Docker CLI is configured with the correct Triton profile, and verify that Docker support is enabled in your Triton account and project.
3. Why are cloud-init scripts failing in my container?
Metadata service may be unreachable. Test curl http://169.254.169.254/metadata/
and ensure zone firewall permits it.
4. Can I use Triton with Terraform or Kubernetes?
Yes, but requires external drivers or Triton-specific plugins. Validate API credentials and use the Joyent-supported Terraform provider.
5. What causes CNs to disappear from the portal?
Expired API credentials or CNAPI service crash. Re-authenticate node keys and verify service status with svcs
.