If the agent run on relays take a very long time when some promises were generated on the root server, it can last more than the duration between runs, and agents might begin to pile up, finally getting killed by the heath check script. In turn, this may slow down policy synchronization from root server to nodes.
Here are some figures to help you know the expected duration based on the number of nodes and files, to help see if there is room for improvement or if you are hitting the protocol's limits.
The following formula should give you an idea of expected duration:
Expected duration (when a lot of files were regenerated) = ((Number of shared-files) + (Number of files by node) x (Number of Nodes behind relay)) x 3 x (RTT between relay and server)
- Number of shared-files can be computed by counting the number of files in /var/rudder/configuration-repository/shared-files on the server.
- Number of files by node can be computed by counting the number of files in /var/rudder/cfengine-community/inputs on the node.
- Number of Nodes behind relay can be computed by counting the number of directories in /var/rudder/share on the relay.
- RTT (Round Trip Time) between relay and server, in secondes. Can be seen in the output of the ping command.
Customer support service by UserEcho