Environment
All versions of Chef Infra Server including AWS Chef Native and Marketplace instances, including frontend nodes in Automate Cluster or Chef Backend HA. All platforms and architectures/topologies.
Issue
When Chef Infra Server is integrated to forward reports to Chef Automate, it uses the internal nginx load balancer to resolve and forward the reports. During normal operations or as part of a planned phase of maintenance, you may observe traffic fail to reach the data-collector endpoint. The logs below would be examples of these kinds of failure:
/var/log/opscode/nginx/access.log
10.10.10.2 - - [27/May/2020:13:58:00 +0000] "GET /compliance/organizations/chef-rog/owners/admin/compliance/cis-amazonlinux-2014.09-2015.03-level2/tar HTTP/1.1" 504 "2.005" 189 "-" "Chef Client/14.3.37 (ruby-2.5.1-p57; ohai-14.3.0; x86_64-linux; +https://chef.io)" "10.10.10.20:443, 10.20.20.20:443" "504, 502" "1.000, 1.000" "14.3.37" "algorithm=sha1;version=1.1;" "chef-client-target-node-1" "2020-05-27T13:57:25Z" "2jmj7l5rSw0yVb/vlWAYkK/YBwk=" 1703
/var/log/opscode/nginx/error.log
2020/05/27 13:58:00 [error] 32386#0: *135442422 upstream timed out (110: Connection timed out) while connecting to upstream, client: 10.10.10.10, server: chef-server.chef.io, request: "GET /compliance/organizations/autodesk/owners/admin/compliance/windows-baseline/tar HTTP/1.1", upstream: "https://10.20.20.20:443/compliance/profiles/admin/windows-baseline/tar", host: "chef-server.chef.io"The data-collector endpoint appears no longer reachable, and when checking your Chef Automate dashboard, you see no further check-ins and no log entries from Chef Infra Server with:
journalctl -u chef-automate -f
Cause
Chef Server uses nginx OSS, which contains a limited subset of features when compared to nginx Plus. This means that Chef Infra Server will not dynamically reload information into its cache despite external changes to DNS.
This means that if no changes are made within the chef-server.rb (or, for example, any of the other YAML config files running chef-server on AWS), Chef Infra Server will continue to function unchanged.
Resolution
Should you make any IP related changes to Chef Automate, Supermarket or any other component which is routed through the Chef Infra Server, the system will require a manual reload of the nginx process, either on the standalone instance or each Chef Infra Server frontend node in the topology.
It's important to note that you do not require a full reconfigure or restart services and the following command should be sufficient
chef-server-ctl restart nginx
Followed by tailing the logs to verify that reports are now correctly forwarding to Automate:
chef-server-ctl tail nginx
You can also verify that traffic is now being correctly received by Chef Automate with:
journalctl -u chef-automate -f
Periodically, restarting nginx across the frontends sequentially when you make any changes to DNS/IP configuration for your Chef environment could be enacted as a step in your sysadmin playbook to ensure this issue doesn't crop up.
Specific to Chef AWS Native, you could also use the increase/decrease option for spinning new instances based on your bootstrap node as per https://github.com/chef-customers/aws_native_chef_server#upgrading-the-chef-server. This would insure they have the latest config and rule out the possibility that a perceived issue is not specific to an existing frontend node.
Comments
0 comments
Article is closed for comments.