When a chef-backend cluster's followers are down too long, you will see this in the followers' /var/log/chef-backend/postgresql/VERSION/current logfiles
2018-04-25_16:36:29.42242 FATAL: the database system is starting up 2018-04-25_16:36:30.90058 LOG: started streaming WAL from primary at 16F3/2D000000 on timeline 88 2018-04-25_16:36:30.90124 FATAL: could not receive data from WAL stream: ERROR: requested WAL segment 00000058000016F30000002D has already been removed 2018-04-25_16:36:30.90125
You can recover the followers, 1807 and 1806, by running the following replacing FOLLOWER_NAME with the appropriate values on the current leader
chef-backend-ctl remove-node FOLLOWER_NAME
Then, on the follower you just removed, replace LEADER_IP_ADDRESS and start a recovery
chef-backend-ctl join-cluster LEADER_IP_ADDRESS --recovery
Try it out using one or the other follower's name. What should happen is that that follower system is able to sync, and then the cluster should start working again as 2/3 is good enough to start serving requests. If it's still blown up, submit a ticket with support and they can help take a look.