Summary
When a chef-client is manually/independently run from the system under which it is managed (as either a service, systemd_timer, cron etc) you may find that the orphaned/independent run can impede a manual attempt to override it.
You will see the following message when attempting a chef-client run:
"chef client 1000 is running, will wait for it to finish and then run"
Distribution
Product | Version | Topology |
Chef Infra Client | 11.x+ | N/A |
Process
Plan
Preparation: N/A
Design: N/A
Configure
Evaluation: N/A
Application: N/A
Troubleshoot
Analysis:
There are numerous scenarios where you may experience a chef-client which cannot be stopped:
- where interval and splay are set incredibly low, allowing the chef-client runs to backup to the point that it is running with minimal breaks
- where runlists have expanded top the point of consuming the entire interval time set between runs
- where a chef_client_updater (supermarket.chef.io/cookbooks/chef_client_updater) is used to transform the existing service to a new service type and either fails because of misconfiguration or is interrupted and subsequently does not remove the previous configuration (we have also observed differences between behaviour in major releases which may introduce this)
- where you use both the effortless pattern (https://www.chef.io/products/effortless-infrastructure/) to run a habitised chef-client in parallel with a native chef-client (chef_client_updater and chef-client cookbook and habitat will not have any knowledge of each others configuration/scheduling)
- Chef client is installed and running on a Chef Server / Chef Automate product with is managed through chef-client ( see https://supermarket.chef.io/cookbooks/managed_automate / https://supermarket.chef.io/cookbooks/managed_chef_server )
Any of the above in combination.
You can gather further information on the nature of the currently configured chef-client with the below commands:
ohai init_package
ps aux | grep chef
crontab -l | grep chef
systemctl status chef-client
journalctl -u chef-client
Which should help you infer what is managing the client and whether this is expected
Remediation:
It is likely that manual intervention is required for all but the effortless example above. It is out of scope for this document to cover the correct configuration and remediation for every eventuality but we can ensure that in the event its needed you are able to stop the current client and prevent it from running again until you are sure it is configured as expected
Stop the service (using the init system of your platform), running :
sudo service chef-client stop
if the issue persists
- find out if any chef client process/service is running:
ps aux | grep chef
- kill the process, ensuring that it was instantiated by whichever service type has become orphaned
if the issue persists further
- look up your chef client settings under the following locations (into system is as example):
/etc/chef/client.rb
/etc/init.d/chef-client
- then locate the
pid_file
andlockfile
path - delete the files
if the issues persists, you may be running an older chef-client (or one on a less well used platform). In this case you need to look in your chef cache folder and delete:
/var/chef/cache/chef-client-running.pid
Comments
0 comments
Please sign in to leave a comment.