Summary
During Chef Infra Server's operations you may observe service degradation in Nginx with regards to the reporting feature/add-on. You may have recently upgraded, patched or hard rebooted the instance and as a result see the below errors:
var/log/opscode/nginx/access.log
10.10.10.10 - - [2020-08-31T16:25:50+00:00] "POST /organizations/chef-test/reports/nodes/chef-node.chef.io/runs HTTP/1.1" 502 "0.010" 179 "-" "Chef Client/14.13.11 ...
/var/log/opscode/opscode-erchef/erchef.log OR /var/log/opscode/opscode-reporting/current
2020-08-31 08:52:17.103 [warning] Could not start the network driver: econnrefused
The above will be cyclic and the service will be restarting continuously in logs.
Distribution
Product | Version | Topology |
Chef Infra Server | 11.x+ | Standalone |
Process
Plan
Preparation: N/A
Design: N/A
Configure
Evaluation: N/A
Application: N/A
Troubleshoot
Analysis:
It is possible that whilst using the reporting add-on in conjunction with core Chef infra Server functionality that over time the reporting database may unknowingly consume excessive disk (the database is located within the core server Postgresql), exacerbate performance degradation or may endure an unexpected interruption in service which cascades into 502 POST error to the /runs API endpoint failures like those mentioned above.
The Reporting Add-on is EOL and has been replaced by equivalent functionality in Chef Automate via the Infrastructure reports (https://automate.chef.io/docs/reports/). Our recommendation is to remove it immediately and where possible proactively do so even if it is not posing difficulty in operating Chef Infra Server
Remediation:
Ospcode-Reporting 1.6.5 (12/20/2016) and earlier
Please use the following instructions when uninstalling opscode-reporting and its Postgres database.
Be sure to use the root user to run all of the following commands.
It is important to first remove the reporting API endpoint so Chef Client runs will no longer use it. If you do not do this before uninstalling reporting, the Chef Client runs will fail. Run the following set of three commands on each frontend. The last command may fail if the database has already been deleted.
rm /var/opt/opscode/nginx/etc/addon.d/*-reporting_*.conf chef-server-ctl hup nginx
Stop the opscode-reporting service:
chef-server-ctl stop opscode-reporting
Delete opscode-reporting's Postgres database and Postgres roles, and delete its data files, log files and configuration files:
opscode-reporting-ctl cleanse --with-external
If --with-external isn't available or doesn't delete the database for some reason, try the following on the Chef Server that is running Postgres after running opscode-reporting-ctl cleanse
echo "DROP DATABASE opscode_reporting;" | su -l opscode-pgsql -c 'psql'
echo "DROP ROLE opscode_reporting;" | su -l opscode-pgsql -c 'psql'
echo "DROP ROLE opscode_reporting_ro;" | su -l opscode-pgsql -c 'psql'
Uninstall the package:
rpm -e opscode-reporting
OR
dpkg --purge opscode-reporting
Delete any remaining files:
rm -fr /opt/opscode-reporting
rm /etc/cron.d/refresh_reporting_matviews
rm -fr /opt/opscode/service/opscode-reporting
Ospcode-Reporting 1.6.6 +
please use the uninstall procedure here https://docs.chef.io/uninstall/#reporting.
If the uninstall command doesnt work as expected please revert to following the instructions for earlier versions.
If you are experiencing disk full issues that relate to the Postgres Database we recommend you look at the follow on article on this subject here - Infra Server space usage unexpected /var
Appendix
Related Articles:
Infra Server space usage unexpected /var
Further Reading:
https://automate.chef.io/docs/reports/
https://docs.chef.io/uninstall/#reporting
Comments
0 comments
Please sign in to leave a comment.