Large Nodes opscode-solr max_field_length

Sean Horn -

Why are my nodes disappearing from the search index?

In pre 12.x and 12.x versions of Chef Server, the Solr max_field_length parameter does not always match up with the maximum potential size of the node data.
This can cause conditions where Solr is unable to index node data of
a size that is permissible through the nginx and erchef maximum sizes.

Search queries follow the path nginx -> opscode-erchef -> Solr

Search indexing follows the path nginx -> opscode-erchef +
opscode-expander -> Solr

The default sizes for objects passed through nginx and erchef respectively are the following.
If you are not receiving 413 errors while updating or creating node, role, environment, etc objects, these sizes will not be a factor for you.

nginx['client_max_body_size'] = 250 megabytes

opscode-erchef['max_request_size'] = 1000000 bytes

Rule of thumb
-------------
Solr max_field_length == approx. `wc -w NODENAME.json` + 30000 for
overhead (This is the # of tokens in the JSON node data according to Solr's
parsing scheme)

I have this in my /etc/opscode/private-chef.rb or /etc/opscode/chef-server.rb and my Solr index could fully ingest the full sized customer test node data found in acc10.json

opscode_solr['max_field_length'] = 154516

```
wc -w nodes/acc10.json
124516 nodes/acc10.json
```

Solr's ability to store node data is most dependent on the
max_field_length setting in its /var/opt/opscode/{opscode-solr,opscode-solr4}/etc/solrconfig.xml file.

Monitoring Node Size
--------------------------

You can monitor node size by including a recipe like the following on all of your nodes.
If it appears that nodes go missing from the search index, the data gathered by this simple recipe can help diagnose the problem

node.normal['initial_size'] = JSON.pretty_generate(node).to_s.size if node['initial_size'].nil?
node.normal['current_size'] = JSON.pretty_generate(node).to_s.size

 

Good / Bad node examples. Tested in Chef Server 12.3.1
----------------------------------------------------------------

In Chef's search index implementation, Solr does not store fields from the node data. It just indexes characteristics of the nodes in a searchable blob, which resolves to an ID for each blob when searched. When we ask erchef to search on a particular characteristic, those IDs are the important thing that comes back
https://github.com/chef/chef-server/blob/7c1df38aee7439674972a49bc157443b5c5c9cbf/src/oc_erchef/apps/chef_index/src/chef_solr.erl#L73-L79

When erchef gets a search request, it sends it to
chef_wm_search:to_json, which ends up sending the result IDs to
chef_wm_search:make_search_results, which ends up calling
chef_wm_search:fetch_result_rows to get the rows with those IDs out
of the opscode_chef database nodes table at
https://github.com/chef/chef-server/blob/7c1df38aee7439674972a49bc157443b5c5c9cbf/src/oc_erchef/apps/oc_chef_wm/src/chef_wm_search.erl#L356

If you were to instrument the to_json, make_search_results, and
fetch_rows functions while a search for a node is occurring in erchef,
you would see something like this coming back. From this, it can be
seen that erchef uses both Solr
and Postgresql to respond to the request.

```
% 10:44:34 <0.14527.0>({proc_lib,init_p,5})
% chef_wm_search:to_json/2 -> {["{","\"total\":","1",",","\"start\":","0",",",
"\"rows\":[",
<<"{\"name\":\"ACCSLHLDNAAUT02.someone.int\",\"chef_environment\":\"ACC_ENV\"
```

```
% 10:46:27 <0.14484.0>({mochiweb_acceptor,init,3})
% chef_wm_search:make_search_results(#Fun<chef_wm_search.6.12028458>, [<<"25d0cb2c68a9603a806e028a3e980c79">>], 5, 0, 1)

% 10:46:27 <0.14484.0>({proc_lib,init_p,5})
% chef_wm_search:make_search_results/5 -> {1,
["{","\"total\":","1",",",
"\"start\":","0",",","\"rows\":[",
<<"{\"name\":\"ACCSLHLDNAAUT02.someone.int\",\"chef_environment\":\"ACC_ENV\",
```

```
% chef_wm_search:fetch_result_rows/4 -> {1,
[<<"{\"name\":\"ACCSLHLDNAAUT02.someone.int\"
```

The DSL search and knife search both do this same thing, because their
searches both flow through opscode-erchef. So, although the node data is to some
extent stored in the search index, it cannot be directly accessed in
its entirety.


Example raw manual search

```
curl -k 'http://localhost:8983/solr/select?sort=X_CHEF_id_CHEF_X+asc&indent=off&start=0&q=content:fqdn__%3D__ACCSLHLDNAAUT02.someone.int&wt=json&fq=%2BX_CHEF_database_CHEF_X:chef_0c20f912d95049078e28edceba2daa43+%2BX_CHEF_type_CHEF_X:node&rows=1000'

{"responseHeader":{"status":0,"QTime":1,"params":{"sort":"X_CHEF_id_CHEF_X asc","indent":"off","start":"0","q":"content:fqdn__=__ACCSLHLDNAAUT02.someone.int","wt":"json","fq":"+X_CHEF_database_CHEF_X:chef_0c20f912d95049078e28edceba2daa43 +X_CHEF_type_CHEF_X:node","rows":"1000"}},"response":{"numFound":1,"start":0,"docs":[{"X_CHEF_id_CHEF_X":"edceba2daa43b79b13044368b32fc8db","X_CHEF_database_CHEF_X":"chef_0c20f912d95049078e28edceba2daa43","X_CHEF_type_CHEF_X":"node","X_CHEF_timestamp_CHEF_X":"2015-12-18T08:10:12.687Z"}]}}

```

Have more questions? Submit a request

Comments

Powered by Zendesk