How can I Manage the Analytics Disk Space Usage?

Sean Horn -

Chef Analytics continuously collects data based on different activities performed across all of your Chef managed nodes, and your Chef server. Because of this behavior, as your Chef environment grows, your Chef Analytics server can begin taking up more space. Below are strategies to manage your Chef Analytics server's disk space usage.

Analytics 1.2.1

Anaytics 1.2.1 no longer stores the complete node data on every run, but does note initial runs, changes, and deletions. This should bring the pace of data saved to Analytics way down when running at this version.

Analytics to 1.2.0 + chef-client with Centrify

You can reduce the size of the data sent to Analytics by a chef-client node in the presence of Centrify by either disabling the Ohai password plugin entirely or testing and implementing the passwd_min plugin if that looks like it will work for you. 

https://docs.chef.io/ohai.html#ohai-settings-in-client-rb or https://github.com/stevendanna/ohai-plugins/blob/master/plugins/passwd_min.rb

Analytics 1.1.5

Chef Analytics 1.1.5 introduced a scheduled job to remove older Analytics data from the database. This job is configured using the opscode-analytics.rb file located in /etc/opscode-analytics/. 

---WARNING---

There is another article that explains how to replace empty copies of any partitions you have deleted, as these deletions will impact future upgrades assuming that they are there. (here)

The setting you would want to modify is:

data_retention['month_interval_to_keep_activities']

The default for this setting is 3 months.

Once this setting is made the Analytics PostgreSQL database will begin deleting rows/tuples via the scheduled job. As well, since the database runs an auto-vacuum it will reclaim the space for use within the database. This will not release space to the underlying operating system. For more details on the PostgreSQL vacuum feature please see: http://www.postgresql.org/docs/9.2/static/sql-vacuum.html

If you have deleted a lot of data from your Chef Analytics database, and want to reclaim the disk space, please see “Performing a FULL VACUUM” below.

Analytics 1.1.4 and older

By default, the Analytics service retains all data. Our Customer Engineer (and many other hats) Alex Pop has a simple, workable solution to manage the storage on the Analytics system more effectively. The little cronjob shown will run daily, and delete events older than three months. Depending on your storage resources, this time range may be acceptable, or it may not.

This will not release space to the underlying operating system. It will release space to be reused by the Chef Analytics database. For more details on the PostgreSQL vacuum feature please see: http://www.postgresql.org/docs/9.2/static/sql-vacuum.html

About 1 million node action saves per month with an average size will result in very roughly 20GB / month of required disk storage. 500k node actions saves should require roughly half that disk storage with the latest 1.1.3 and 1.1.4 version of the product.

In that light, it is possible to change the active SQL query here to read like this for 2 weeks retention, and so on. 1 week, or 3 weeks, 4 days, etc will all work just fine, as postgres understands human intervals through the interval function.

cat<<'EOF' > /etc/cron.daily/chef-analytics.cron
#!/bin/sh
log='/var/log/opscode-analytics/cull.log'
echo "[$(date)] Stopping Alaska service..." >> $log
opscode-analytics-ctl stop alaska >> $log 2>&1
echo "[$(date)] Deleting Chef Analytics events older than 3 months" >> $log
echo "delete from public.activities where recorded_at <  (now() - interval '3 months');" | su -l chef-pgsql -c 'psql actions' &>> $log
echo "[$(date)] Starting Alaska service..." >> $log
opscode-analytics-ctl start alaska >> $log 2>&1
echo >> $log
exit 0
EOF

chmod +x /etc/cron.daily/chef-analytics.cron

Performing a FULL VACUUM

We have seen an occasional problem where customers who adopted Analytics early have several more months of data than they would like to keep. Using the methods above, storage space is never reclaimed by the operating system running your Analytics server, as a FULL VACUUM is never run.

There are a few things you should be aware of before you tackle a FULL VACUUM on your Analytics database.

  1. This is going to require downtime on your Analytics instance because a FULL VACUUM requires exclusive locks on the tables it’s vacuuming.

  2. You MUST have approximately 2x the size of your actions database in disk space available. While performing a FULL VACUUM, PostgreSQL will create a new table on disk, then delete the old one after it is complete. This additional space must come from outside the full filesystem. One way to provide the required extra space is by providing a new, larger filesystem to the Analytics system, mounting it, then copying the Analytics postgresql data area to the new space. Then, you could symlink the new space into the proper place in the original filesystem. If you do not have or cannot get the required space, please stop reading now and take a look at https://getchef.zendesk.com/hc/en-us/articles/206076926-Analytics-System-Disk-Full-Quick-Fix

  3. We should run an ANALYZE command on the Database immediately following the FULL VACUUM to ensure execution plans remain optimized.

  4. Once your surplus of data is removed using a FULL VACUUM, the auto-vacuum should maintain your database size. Running a FULL VACUUM is not required on a regular basis.

  5. Because Analytics will effectively be down during the FULL VACUUM, and the FULL VACUUM and analysis of your actions database could take a significant amount of time, you should ensure your Chef Server’s rabbitmq instance is capped at a maximum number of messages. Since Chef Analytics isn’t removing messages from the queue, there is the potential for RabbitMQ to fill the disk space of your Chef server and cause an incident if it is not capped. Please see the ‘pre-requisites` section of the Install Analytics doc on how to cap the RabbitMQ instance: https://docs.chef.io/install_analytics.html.

If you accept the points above and are prepared, you can execute your FULL VACUUM.

# Obtain and mount new space if needed. X should be the amount of space needed to make 210% of the current OS storage requirement

  1. Log onto your Analytics server, and obtain root privileges.
  2. pvcreate NEWDEVICE
  3. vgcreate /dev/testvg NEWDEVICE
  4. lvcreate -n testlv1 -L Xg /dev/testvg
  5. mkfs.ext4 /dev/analyticsvggreen/analyticslvgreen
  6. mkdir /mnt/new-analytics
  7. mount /dev/analyticsvggreen/analyticslvgreen /mnt/new-analytics

# Check the new space

  • df -h

# Stop the running Analytics system

  • opscode-analytics-ctl stop

# Copy the existing data

  1. rsync -avzh /var/opt/opscode-analytics/ /mnt/new-analytics/
  2. remove the old directories and files under /var/opt/opscode-analytics/
  3. ln -s /mnt/new-analytics/* /var/opt/opscode-analytics

# Start the Analytics system on the new data area

  • opscode-analytics-ctl start

To continue with the VACUUM FULL, follow the steps below:

  1. Stop all of your analytics services: opscode-analytics-ctl stop

  2. Start your analytics postgres service: opscode-analytics-ctl start postgresql

  3. sudo su - chef-pgsql
  4. Execute the following command: nohup /opt/opscode-analytics/embedded/bin/psql -U chef-pgsql actions -c 'VACUUM (VERBOSE,FULL,ANALYZE);' > vacuum-full-analyze-`date '+%s'` 2>&1 &

  5. This could take a significant amount of time. Wait.

  6. When the Vacuum is done it will print “VACUUM” out to a file named vacuum-full-analyze-<datetime_stamp> in the directory you ran the command from.

  7. Start your analytics services: opscode-analytics-ctl start

You have performed a FULL VACUUM at this point, and you should see any excess disk space that was being held in your PostgreSQL database reclaimed by the OS.

Have more questions? Submit a request

Comments

  • Avatar
    Irving Popovetsky

    Deleting rows in Postgres will not reclaim disk space at all, according to the Postgres docs. After a delete and an autovacuum (or normal vacuum) the space is still claimed on disk. Only a VACUUM FULL can reclaim disk space, but it requires making a full copy of the database.

    ref: section 23.1.2 http://www.postgresql.org/docs/9.2/static/routine-vacuuming.html

Powered by Zendesk