Understanding Rundeck Architecture

Job Engine, Node Executor, and Plugins

Rundeck executes jobs composed of steps across target nodes. Each node uses an executor (SSH, WinRM, etc.) configured via resource model providers. Failures often stem from missing credentials, incorrect executor settings, or broken resource definitions.

Access Control and Projects

Rundeck uses YAML- or ACL-based policy files to govern permissions per user or role. Improper ACL syntax or file placement can block job access or expose critical operations.

Common Rundeck Issues in Production

1. Job Execution Fails or Times Out

Issues with SSH keys, unreachable nodes, or step plugin exceptions can cause job failure.

Execution failed: 255: SSH connection refused or timed out
  • Check node connectivity via CLI before job execution.
  • Verify SSH key access and sudo rights for the rundeck user.

2. Node Discovery Not Working

Dynamic resource model providers (e.g., AWS EC2, Ansible) may fail due to invalid credentials, outdated plugins, or misconfigured filters.

3. ACL Denied Access to Job or Node

Users receive permission denied errors when ACL policies are improperly scoped or syntactically invalid.

4. Plugin Failures or Incompatibility

Custom or third-party plugins may fail to load or execute after Rundeck upgrades or Java changes.

5. Integration Failures with LDAP or Key Storage

LDAP misbinds, expired certs, or missing key paths in the key storage system can block authentication or job credential injection.

Diagnostics and Debugging Techniques

Enable Debug Logs in rundeck-config.properties

Set loglevel.default = DEBUG and inspect service.log and rundeck.log under /var/log/rundeck/ for root causes.

Test Nodes with Ad-hoc Commands

Use “Run command” to validate node communication independently of job definitions. Helps isolate SSH vs job syntax errors.

Validate ACLs with the Access Validator

Navigate to Admin → Security → Access Validator to simulate access for a user/role and identify missing permissions.

Manually Test Plugin Classpath

Check plugin compatibility with Rundeck version. Ensure JARs are deployed under libext/ and dependencies are shaded properly.

Step-by-Step Resolution Guide

1. Fix SSH Job Execution Failures

Ensure SSH private keys are stored in the key storage path and referenced correctly in project config. Confirm agentless access works outside Rundeck.

2. Repair Node Discovery

Review resources.xml or dynamic source plugin config. Re-authenticate API credentials or fix region/filters for cloud discovery.

3. Correct ACL File Errors

Lint ACL files for syntax issues. Place policies in /etc/rundeck/aclpolicy/ and restart Rundeck to apply changes.

4. Resolve Plugin Errors

Reinstall plugins after upgrades. Review plugin logs under logs/rundeck.plugin*. Consider using Plugin Annotations for compatibility with latest Rundeck SDK.

5. Fix LDAP and Key Storage Integration

Enable verbose LDAP logging. Validate jaas-ldap.conf entries and DN mappings. Check if storage paths exist and are properly permissioned.

Best Practices for Stable Rundeck Operations

  • Use node filters to scope jobs to only eligible targets.
  • Isolate job definitions in version control using Rundeck’s SCM plugin.
  • Tag nodes for environment (e.g., prod, staging) to prevent accidental job runs.
  • Regularly test ACL policy enforcement using non-admin accounts.
  • Automate plugin validation on Rundeck upgrades via CI/CD pipelines.

Conclusion

Rundeck enhances operational efficiency through automated jobs and secure orchestration. However, stability depends on accurate node discovery, plugin health, access control, and integration management. By enabling detailed logs, validating ACLs, and structuring workflows modularly, teams can minimize downtime and streamline DevOps automation across environments.

FAQs

1. Why do my jobs fail with SSH errors?

SSH keys may be missing, incorrect, or improperly scoped in the key storage path. Also verify target node connectivity and permissions.

2. How can I verify if a user can run a job?

Use the Access Validator in the UI to simulate permissions. Check associated ACL policy files and validate YAML syntax.

3. What causes node discovery to stop working?

Dynamic providers may fail due to expired credentials, plugin updates, or cloud API region mismatches. Check logs and refresh tokens.

4. Are plugins affected by Rundeck upgrades?

Yes. Plugins may require rebuilds or updates to be compatible with new APIs. Always test plugins after version changes.

5. How do I store and inject secrets securely in jobs?

Use the key storage system and reference paths with @ syntax (e.g., @project.ssh.key). Ensure read permissions via ACL policies.