When a message larger than 4MB is submitted to Automate through a Chef Server, the submitting client will often get a 429 response, and also a message about the maximum size and by how much the submission exceeded that value.
In this article, we will look at how to investigate such a problem when the issue is caused by the size of the report sent at the end of a chef-client run. This is not the same report as the one sent at the end of an Audit cookbook run. They are independent, but still very similar. In fact the process below can largely be referred to when debugging overly large compliance report sizes.
First, write the Chef Client run report out to disk as JSON. The audit cookbook readme shows an easy to way to do this at https://github.com/chef-cookbooks/audit#write-to-file-on-disk.
When a Chef Client fails trying to send a message_type of run_converge, the full message will be dumped in the debug output of the Chef Client run log. You could also grab that full JSON output there and store it in a file named request.body.json to be used as the source file below.
Then, use jq and some simple command lines to investigate the problem and narrow down the section of the report that is misbehaving.
# Look at size of top level keys on a chef-client run report previously written to disk as JSON
for ii in $(jq 'keys' request.body.json ); do echo -n "$ii: "; jq ".$ii" ~/Downloads/request.body.json | wc -c ; done
# Looks like the resources key holds 3MB+ of the total run report
jq '.resources' request.body.json | wc -c
# There are 183 resources, starting at 0
jq '.resources | length' request.body.json
# enumerate the resources, looking for the biggest
for ii in $(seq 0 182); do echo -n "$ii: "; jq ".resources[$ii]" request.body.json | wc -c ; done
# resource 176 was huge, 3MB+
jq '.resources' request.body.json | less
Resource 176, a template resource, was huge because it had all of the automatic precedence level Ohai data already stored in the node repeated in the same node at a different level because of action that a cookbook recipe took. Template resources have a variables property. It's convenient to assign EVERYTHING to this property and then only read what you need, but that causes a problem in the Chef Client run report submitted to Chef Server and proxied on to the Automate system. ALL of that data that is already stored under the automatic: level of the node is then stored again, under the variables: property of the template and thus in the regular default level attribute data of the node.
Instead of doing this, where the entire node is sent to the template....
template "monstah" do
Do this instead. Send only the necessary data to the template
template "monstah" do