Chef Server System Out of Disk Space, Services Behaving Strangely Afterwards

Sean Horn -

You may find that your chef server refuses to service any API requests after a disk full condition, even after a cleanup frees up disk space. You may see 500 or 503 responses in reply to all API requests.

If you also notice the redis_lb service behaving strangely in the output of chef-server-ctl status or in its /var/log/opscode/redis_lb/current logfile, please refer to Redis Troubleshooting for further detail.

If you notice statements with the phrase "rabbitmq msg_store_persistent" in the rabbitmq logs or that the /var/opt/ filesystem is full and all of the storage usage is centered around /var/opt/opscode/rabbitmq, the disk rabbit stores data on is full. The most likely reason for a Chef Server to be out of disk space is that it has been configured to send data to a Chef Analytics system and has run out of space trying to queue data that should have been consumed by the Analytics system. Once the available memory runs out and a queue is durable, Rabbitmq starts queueing messes to disk. Once the disk runs out, the systems fails.

You can clean up by shutting down 

    chef-server-ctl stop rabbitmq

    rm -fr /var/opt/opscode/rabbitmq/db/rabbit@localhost/msg_store_persistent/*

    chef-server-ctl start rabbitmq

 

If you find that opscode-erchef is cycling on rabbit in its /var/log/opscode/rabbitmq/current logfile, the likely culprit is a stuck wait-for-rabbit shell script. Do the following to recover

 

    chef-server-ctl stop

    pkill -9 wait-for-rabbit

    chef-server-ctl start

Have more questions? Submit a request

Comments

Powered by Zendesk