How do chef-client reports flow through an Automate cluster?

Sean Horn -

We're assuming here that you are proxying chef-client reports through a Chef Server, which is the default. If you don't see the words "token" or "data-collector" in a node's /etc/chef/client.rb, the node reports are either not being sent, or they are being proxied automatically through the Chef Server, if it and the Automate system have been set up for that function. More detail on that can be found here https://docs.chef.io/data_collection.html

 

Check the nodes' debug level log file for data_collector connections when you run it manually. There should be one at the beginning of the run to signify the start of the chef-client run, then one with the collected data reported from the run.

chef-client -l debug -L myrun.log.txt

[2017-07-26T17:36:42-04:00] DEBUG: Initiating POST to https://chef-432.lxc/organizations/delivery/data-collector

Are there problems showing up in /var/log/opscode/nginx/access.log on the Chef Server?
If your Chef Server is configured to proxy data collector connections from nodes back into the Automate system, then the connections from the nodes have to show up there first. Here are the posts for the data-collector in the access.log on my Chef Server system from the chef-client run above. They were successful, showing 204s, the first has very little content length, because it's just a start of run marker, and the final payload content length is large, because it holds the data proxied back to the Automate system. Basically, the full JSON content of the node.

10.0.3.82 - - [26/Jul/2017:17:36:42 -0400]  "POST /organizations/delivery/data-collector HTTP/1.1" 204 "0.007" 0 "-" "Chef Client/12.19.36 (ruby-2.3.1-p112; ohai-8.23.0; x86_64-linux; +https://chef.io)" "10.0.3.231:443" "204" "0.002" "12.19.36" "algorithm=sha1;version=1.1;" "runner-2-432.lxc" "2017-07-26T21:36:42Z" "XAILfUH4CddKZAX1hI6hlROtWL8=" 1457
10.0.3.82 - - [26/Jul/2017:17:36:42 -0400]  "POST /organizations/delivery/environments/_default/cookbook_versions HTTP/1.1" 200 "0.020" 2 "-" "Chef Client/12.19.36 (ruby-2.3.1-p112; ohai-8.23.0; x86_64-linux; +https://chef.io)" "127.0.0.1:8000" "200" "0.020" "12.19.36" "algorithm=sha1;version=1.1;" "runner-2-432.lxc" "2017-07-26T21:36:42Z" "cRLXPTAdvq6x5/h9QUVSIhGKjnc=" 1142
10.0.3.82 - - [26/Jul/2017:17:36:42 -0400]  "PUT /organizations/delivery/nodes/runner-2-432.lxc HTTP/1.1" 200 "0.035" 17470 "-" "Chef Client/12.19.36 (ruby-2.3.1-p112; ohai-8.23.0; x86_64-linux; +https://chef.io)" "127.0.0.1:8000" "200" "0.032" "12.19.36" "algorithm=sha1;version=1.1;" "runner-2-432.lxc" "2017-07-26T21:36:42Z" "yMhaqLRzhlw5sNmLRCtWgwdE7pc=" 84043
10.0.3.82 - - [26/Jul/2017:17:36:42 -0400]  "POST /organizations/delivery/data-collector HTTP/1.1" 204 "0.016" 0 "-" "Chef Client/12.19.36 (ruby-2.3.1-p112; ohai-8.23.0; x86_64-linux; +https://chef.io)" "10.0.3.231:443" "204" "0.002" "12.19.36" "algorithm=sha1;version=1.1;" "runner-2-432.lxc" "2017-07-26T21:36:42Z" "iSh6HXt6s6sC9TqbJ3gwQcu3wIw=" 84605

Finally, the requests make it back into /var/log/delivery/nginx/delivery.access.log on the Automate system

10.0.3.230 - - [26/Jul/2017:17:36:42 -0400]  "POST /data-collector/v0/ HTTP/1.0" 204 "0.001" 0 "-" "Chef Client/12.19.36 (ruby-2.3.1-p112; ohai-8.23.0; x86_64-linux; +https://chef.io)" "127.0.0.1:9611" "204" "0.001" "12.19.36" "algorithm=sha1;version=1.1;" "runner-2-432.lxc" "2017-07-26T21:36:42Z" "XAILfUH4CddKZAX1hI6hlROtWL8=" 1509
10.0.3.230 - - [26/Jul/2017:17:36:42 -0400]  "POST /data-collector/v0/ HTTP/1.1" 204 "0.002" 0 "-" "-" "127.0.0.1:9611" "204" "0.001" "-" "-" "-" "-" "-" 83673
10.0.3.230 - - [26/Jul/2017:17:36:42 -0400]  "POST /data-collector/v0/ HTTP/1.0" 204 "0.001" 0 "-" "Chef Client/12.19.36 (ruby-2.3.1-p112; ohai-8.23.0; x86_64-linux; +https://chef.io)" "127.0.0.1:9611" "204" "0.001" "12.19.36" "algorithm=sha1;version=1.1;" "runner-2-432.lxc" "2017-07-26T21:36:42Z" "iSh6HXt6s6sC9TqbJ3gwQcu3wIw=" 84657

However, the end of the story is that the data-collector POST request payload has to be submitted and ingested into ES on the Automate system. The first step in this ingestion process is that Automate's delivery process puts incoming data-collector items on the rabbitmq /insights/data-collector queue.

logstash is used to consume from this queue. It takes the queued items off the rabbitmq /insights/data-collector queue, manipulates the messages in specific ways, and then submits the messages to elasticsearch.

You can check your data-collector queue with 

export PATH=/opt/delivery/embedded/bin:$PATH
rabbitmqctl list_queues -p /insights

The submission of items to elasticsearch by logstash from the data-collector queue can be seen in one file on the Automate system

/var/log/delivery/nginx/es_proxy.access.log

127.0.0.1 - - [26/Jul/2017:17:36:46 -0400]  "POST /elasticsearch//_bulk HTTP/1.1" 200 "0.033" 200 "-" "Manticore 0.6.0" "127.0.0.1:9200" "200" "0.033" "-" "-" "-" "-" "-" 107078

Only during the first ingestion will you see something like this in the ES logfile, as the dynamic mappings are created for the converge and node-state indices. The logfile name for ES is called /var/log/delivery/elasticsearch/current. There are other daily logfiles with more or less detail. ES is otherwise silent unless something major or bad happens.

2017-07-26_21:35:54.83263 [2017-07-26 17:35:54,832][INFO ][cluster.metadata         ] [automate-432] [insights-2017.07.26] update_mapping [converge]
2017-07-26_21:35:55.18765 [2017-07-26 17:35:55,187][INFO ][cluster.metadata         ] [automate-432] [node-state-1] update_mapping [node-state]
Have more questions? Submit a request

Comments

Powered by Zendesk