Chef Analytics continuously collects data based on different activities performed across all of your Chef managed nodes, and your Chef server. Because of this behavior, as your Chef environment grows, your Chef Analytics server can begin taking up more space. Below are strategies to manage your Chef Analytics server's disk space usage.
Anaytics 1.2.1 no longer stores the complete node data on every run, but does note initial runs, changes, and deletions. This should bring the pace of data saved to Analytics way down when running at this version.
Analytics to 1.2.0 + chef-client with Centrify
You can reduce the size of the data sent to Analytics by a chef-client node in the presence of Centrify by either disabling the Ohai password plugin entirely or testing and implementing the passwd_min plugin if that looks like it will work for you.
https://docs.chef.io/ohai.html#ohai-settings-in-client-rb or https://github.com/stevendanna/ohai-plugins/blob/master/plugins/passwd_min.rb
Chef Analytics 1.1.5 introduced a scheduled job to remove older Analytics data from the database. This job is configured using the opscode-analytics.rb file located in /etc/opscode-analytics/.
There is another article that explains how to replace empty copies of any partitions you have deleted, as these deletions will impact future upgrades assuming that they are there. (here)
The setting you would want to modify is:
The default for this setting is 3 months.
Once this setting is made the Analytics PostgreSQL database will begin deleting rows/tuples via the scheduled job. As well, since the database runs an auto-vacuum it will reclaim the space for use within the database. This will not release space to the underlying operating system. For more details on the PostgreSQL vacuum feature please see: http://www.postgresql.org/docs/9.2/static/sql-vacuum.html
If you have deleted a lot of data from your Chef Analytics database, and want to reclaim the disk space, please see “Performing a
FULL VACUUM” below.
Analytics 1.1.4 and older
By default, the Analytics service retains all data. Our Customer Engineer (and many other hats) Alex Pop has a simple, workable solution to manage the storage on the Analytics system more effectively. The little cronjob shown will run daily, and delete events older than three months. Depending on your storage resources, this time range may be acceptable, or it may not.
This will not release space to the underlying operating system. It will release space to be reused by the Chef Analytics database. For more details on the PostgreSQL vacuum feature please see: http://www.postgresql.org/docs/9.2/static/sql-vacuum.html
About 1 million node action saves per month with an average size will result in very roughly 20GB / month of required disk storage. 500k node actions saves should require roughly half that disk storage with the latest 1.1.3 and 1.1.4 version of the product.
In that light, it is possible to change the active SQL query here to read like this for 2 weeks retention, and so on. 1 week, or 3 weeks, 4 days, etc will all work just fine, as postgres understands human intervals through the interval function.
cat<<'EOF' > /etc/cron.daily/chef-analytics.cron #!/bin/sh log='/var/log/opscode-analytics/cull.log' echo "[$(date)] Stopping Alaska service..." >> $log opscode-analytics-ctl stop alaska >> $log 2>&1 echo "[$(date)] Deleting Chef Analytics events older than 3 months" >> $log echo "delete from public.activities where recorded_at < (now() - interval '3 months');" | su -l chef-pgsql -c 'psql actions' &>> $log echo "[$(date)] Starting Alaska service..." >> $log opscode-analytics-ctl start alaska >> $log 2>&1 echo >> $log exit 0 EOF chmod +x /etc/cron.daily/chef-analytics.cron
Performing a FULL VACUUM
We have seen an occasional problem where customers who adopted Analytics early have several more months of data than they would like to keep. Using the methods above, storage space is never reclaimed by the operating system running your Analytics server, as a FULL VACUUM is never run.
There are a few things you should be aware of before you tackle a
FULL VACUUM on your Analytics database.
This is going to require downtime on your Analytics instance because a
FULL VACUUMrequires exclusive locks on the tables it’s vacuuming.
You MUST have approximately 2x the size of your actions database in disk space available. While performing a
FULL VACUUM,PostgreSQL will create a new table on disk, then delete the old one after it is complete. This additional space must come from outside the full filesystem. One way to provide the required extra space is by providing a new, larger filesystem to the Analytics system, mounting it, then copying the Analytics postgresql data area to the new space. Then, you could symlink the new space into the proper place in the original filesystem. If you do not have or cannot get the required space, please stop reading now and take a look at https://getchef.zendesk.com/hc/en-us/articles/206076926-Analytics-System-Disk-Full-Quick-Fix
We should run an
ANALYZEcommand on the Database immediately following the
FULL VACUUMto ensure execution plans remain optimized.
Once your surplus of data is removed using a
FULL VACUUM, the auto-vacuum should maintain your database size. Running a FULL VACUUM is not required on a regular basis.
Because Analytics will effectively be down during the
FULL VACUUM, and the
FULL VACUUMand analysis of your actions database could take a significant amount of time, you should ensure your Chef Server’s rabbitmq instance is capped at a maximum number of messages. Since Chef Analytics isn’t removing messages from the queue, there is the potential for RabbitMQ to fill the disk space of your Chef server and cause an incident if it is not capped. Please see the ‘pre-requisites` section of the Install Analytics doc on how to cap the RabbitMQ instance: https://docs.chef.io/install_analytics.html.
If you accept the points above and are prepared, you can execute your FULL VACUUM.
# Obtain and mount new space if needed. X should be the amount of space needed to make 210% of the current OS storage requirement
- Log onto your Analytics server, and obtain root privileges.
vgcreate /dev/testvg NEWDEVICE
lvcreate -n testlv1 -L Xg /dev/testvg
mount /dev/analyticsvggreen/analyticslvgreen /mnt/new-analytics
# Check the new space
# Stop the running Analytics system
# Copy the existing data
rsync -avzh /var/opt/opscode-analytics/ /mnt/new-analytics/
- remove the old directories and files under /var/opt/opscode-analytics/
ln -s /mnt/new-analytics/* /var/opt/opscode-analytics
# Start the Analytics system on the new data area
To continue with the VACUUM FULL, follow the steps below:
Stop all of your analytics services:
Start your analytics postgres service:
opscode-analytics-ctl start postgresql
sudo su - chef-pgsql
Execute the following command:
nohup /opt/opscode-analytics/embedded/bin/psql -U chef-pgsql actions -c 'VACUUM (VERBOSE,FULL,ANALYZE);' > vacuum-full-analyze-`date '+%s'` 2>&1 &
This could take a significant amount of time. Wait.
When the Vacuum is done it will print “VACUUM” out to a file named
vacuum-full-analyze-<datetime_stamp>in the directory you ran the command from.
Start your analytics services:
You have performed a FULL VACUUM at this point, and you should see any excess disk space that was being held in your PostgreSQL database reclaimed by the OS.