Understanding Rundeck Architecture
Job Engine, Node Executor, and Plugins
Rundeck executes jobs composed of steps across target nodes. Each node uses an executor (SSH, WinRM, etc.) configured via resource model providers. Failures often stem from missing credentials, incorrect executor settings, or broken resource definitions.
Access Control and Projects
Rundeck uses YAML- or ACL-based policy files to govern permissions per user or role. Improper ACL syntax or file placement can block job access or expose critical operations.
Common Rundeck Issues in Production
1. Job Execution Fails or Times Out
Issues with SSH keys, unreachable nodes, or step plugin exceptions can cause job failure.
Execution failed: 255: SSH connection refused or timed out
- Check node connectivity via CLI before job execution.
- Verify SSH key access and sudo rights for the rundeck user.
2. Node Discovery Not Working
Dynamic resource model providers (e.g., AWS EC2, Ansible) may fail due to invalid credentials, outdated plugins, or misconfigured filters.
3. ACL Denied Access to Job or Node
Users receive permission denied errors when ACL policies are improperly scoped or syntactically invalid.
4. Plugin Failures or Incompatibility
Custom or third-party plugins may fail to load or execute after Rundeck upgrades or Java changes.
5. Integration Failures with LDAP or Key Storage
LDAP misbinds, expired certs, or missing key paths in the key storage system can block authentication or job credential injection.
Diagnostics and Debugging Techniques
Enable Debug Logs in rundeck-config.properties
Set loglevel.default = DEBUG
and inspect service.log
and rundeck.log
under /var/log/rundeck/
for root causes.
Test Nodes with Ad-hoc Commands
Use “Run command” to validate node communication independently of job definitions. Helps isolate SSH vs job syntax errors.
Validate ACLs with the Access Validator
Navigate to Admin → Security → Access Validator to simulate access for a user/role and identify missing permissions.
Manually Test Plugin Classpath
Check plugin compatibility with Rundeck version. Ensure JARs are deployed under libext/
and dependencies are shaded properly.
Step-by-Step Resolution Guide
1. Fix SSH Job Execution Failures
Ensure SSH private keys are stored in the key storage path and referenced correctly in project config. Confirm agentless access works outside Rundeck.
2. Repair Node Discovery
Review resources.xml
or dynamic source plugin config. Re-authenticate API credentials or fix region/filters for cloud discovery.
3. Correct ACL File Errors
Lint ACL files for syntax issues. Place policies in /etc/rundeck/aclpolicy/
and restart Rundeck to apply changes.
4. Resolve Plugin Errors
Reinstall plugins after upgrades. Review plugin logs under logs/rundeck.plugin*
. Consider using Plugin Annotations for compatibility with latest Rundeck SDK.
5. Fix LDAP and Key Storage Integration
Enable verbose LDAP logging. Validate jaas-ldap.conf
entries and DN mappings. Check if storage paths exist and are properly permissioned.
Best Practices for Stable Rundeck Operations
- Use node filters to scope jobs to only eligible targets.
- Isolate job definitions in version control using Rundeck’s SCM plugin.
- Tag nodes for environment (e.g., prod, staging) to prevent accidental job runs.
- Regularly test ACL policy enforcement using non-admin accounts.
- Automate plugin validation on Rundeck upgrades via CI/CD pipelines.
Conclusion
Rundeck enhances operational efficiency through automated jobs and secure orchestration. However, stability depends on accurate node discovery, plugin health, access control, and integration management. By enabling detailed logs, validating ACLs, and structuring workflows modularly, teams can minimize downtime and streamline DevOps automation across environments.
FAQs
1. Why do my jobs fail with SSH errors?
SSH keys may be missing, incorrect, or improperly scoped in the key storage path. Also verify target node connectivity and permissions.
2. How can I verify if a user can run a job?
Use the Access Validator in the UI to simulate permissions. Check associated ACL policy files and validate YAML syntax.
3. What causes node discovery to stop working?
Dynamic providers may fail due to expired credentials, plugin updates, or cloud API region mismatches. Check logs and refresh tokens.
4. Are plugins affected by Rundeck upgrades?
Yes. Plugins may require rebuilds or updates to be compatible with new APIs. Always test plugins after version changes.
5. How do I store and inject secrets securely in jobs?
Use the key storage system and reference paths with @
syntax (e.g., @project.ssh.key
). Ensure read permissions via ACL policies.