Issue
You have recently upgraded a Chef-Backend HA Chef Server install and you are receiving 503 errors on frontend servers when migrating to new backend servers. By "migration", we mean replacing followers then promoting a new follower to leader while original leader is replaced.
Environments
Chef-Backend HA
Discussion
NOTE: Chef Backend should never have more than 3 nodes in a cluster at any given time. Remove to-be-retired nodes before adding new nodes. For example:
chef-backend-ctl remove-node NODENAME
Upon promoting a new backend server to leader, frontend servers start receiving 503 errors and cease to function properly. Display pg_stat_activity.txt from your gather-logs bundle to see the following issue. The file will appear at the top level of the extracted data with a format like FQDN/TIMESTAMP/pg_stat_activity.txt:
cat FQDN/20191219T121636Z/pg_stat_activity.txt/pg_stat_activity.txt
psql: FATAL: no pg_hba.conf entry for host "x.x.x.x", user "chef_pgsql", database "opscode_chef", SSL on
FATAL: no pg_hba.conf entry for host "x.x.x.x", user "chef_pgsql", database "opscode_chef", SSL off
Resolution
Ensure that new server IPs are listed in /etc/chef-backend/chef-backend.rb. This is especially important when new servers are on different networks. Additional details regarding this configuration can be found at: https://docs.chef.io/install_server_ha/#configuring-frontend-and-backend-members-on-different-networks. Consider a full restart of the frontends. Be certain the opscode-erchef service has restarted, as it will be most affected.
Cause
This issue can be caused by some network connectivity problem between the frontend servers and the backend servers due to a network change. Ensure that all configs are updated. Frontends do not necessarily retry PostgreSQL and Elasticsearch connections if they have once failed, hence the restart direction above.
Comments
0 comments
Article is closed for comments.