You have recently upgraded a Chef-Backend HA Chef Server install and you are receiving 503 errors on frontend servers when migrating to new backend servers. By "migration", we mean replacing followers then promoting a new follower to leader while original leader is replaced.
NOTE: Chef Backend should never have more than 3 nodes in a cluster at any given time. Remove to-be-retired nodes before adding new nodes. For example:
chef-backend-ctl remove-node NODENAME
Upon promoting a new backend server to leader, frontend servers start receiving 503 errors and cease to function properly. Display pg_stat_activity.txt from your gather-logs bundle to see the following issue. The file will appear at the top level of the extracted data with a format like FQDN/TIMESTAMP/pg_stat_activity.txt:
psql: FATAL: no pg_hba.conf entry for host "x.x.x.x", user "chef_pgsql", database "opscode_chef", SSL on
FATAL: no pg_hba.conf entry for host "x.x.x.x", user "chef_pgsql", database "opscode_chef", SSL off
Ensure that new server IPs are listed in /etc/chef-backend/chef-backend.rb. This is especially important when new servers are on different networks. Additional details regarding this configuration can be found at: https://docs.chef.io/install_server_ha/#configuring-frontend-and-backend-members-on-different-networks. Consider a full restart of the frontends. Be certain the opscode-erchef service has restarted, as it will be most affected.
This issue can be caused by some network connectivity problem between the frontend servers and the backend servers due to a network change. Ensure that all configs are updated. Frontends do not necessarily retry PostgreSQL and Elasticsearch connections if they have once failed, hence the restart direction above.