Troubleshoot Cookbook Dependency Issues

Sean Horn -

Troubleshoot Cookbook Dependency Issues

Note: Please update to the latest ChefDK first, before working through the following

Cookbook version constraints can be defined in environments, cookbook metadata, role run_lists and node run_lists.

Many factors can make calculating the cookbook dependency graph more complicated than it would seem at first glance. Sometimes this complexity can lead to unexpected results or even a failure to complete the calculation in time resulting in an API error.

In some cases the error message you receive will have enough information to visually identify the conflicting version constraint.

If the information in the error message isn't sufficient or if it is too difficult to determine the conflict then the best way to troubleshoot the problem is to systematically modify the run_list, environment version constraints and even cookbook metadata to narrow down and identify the source of the problem. It's best to do this work in a safe, non-impacting manner usually by testing modifications made to copies of the relevant information.

Tools like the cookbook-versions.rb script attached to the bottom of this document or the knife-solve plugin which use the Chef Server's dependency solver API can calculate the dependency graph of a node's run_list without modifying anything on the Chef Server or running chef-client on the node. Consider this tool as an better alternative to either of the above https://github.com/jeremiahsnapp/knife-depsolver

Copies of the node and its Chef environment can be made so significant changes to the run_list and Chef environment version constraints can be tested without impacting the original node or Chef environment. Instructions for setting this up are found below.

Sometimes it helps to test modifications of declared cookbook dependencies and version constraints in cookbook metadata. Capturing all cookbook metadata from all versions of all cookbooks and uploading it to an empty Chef organization is a non-impacting way to allow for this kind of testing. Instructions for setting this up are found below.

A good first step would be to reduce the number of cookbook versions to the minimum required to run your site. Many candidate cookbook versions will make it much more likely that a resolution timeout/failure will occur, as the resolver has to consider them all while attempting a solution.

Copy Node and Chef Environment

The following knife exec command will copy a node named example-node to a node named troubleshoot-example-node.

knife exec -E 'n = nodes.show("example-node"); n.name("troubleshoot-" + n.name); n.save'

The following knife exec command will copy a Chef environment named example-env to a Chef environment named troubleshoot-example-env.

knife exec -E 'e = environments.show("example-env"); e.name("troubleshoot-" + e.name); e.save'

Use the copy of the Chef environment

Set the new node to use the copy of the Chef environment.

knife exec -E 'n = nodes.show("troubleshoot-example-node"); n.chef_environment("troubleshoot-example-env"); n.save'

Use the expanded run_list

Get the original node's expanded run_list and assign it to the copy of the node. The expanded run_list has replaced any roles with their corresponding recipes. The expanded run_list will include any cookbook version constraint specified directly in a run_list.

knife exec -E 'n = nodes.show("troubleshoot-example-node"); n.run_list(n.expand!.recipes.with_version_constraints_strings); n.save'

Iteratively test cookbook dependency solving

At this point you can begin making modifications to the cookbook version constraints in the node's run_list and/or the Chef environment and check the results using either the cookbook-versions.rb script or the knife-solve plugin.

For example:

knife exec cookbook-versions.rb troubleshoot-example-node

How to use a separate Chef organization for troubleshooting

Sometimes it's desirable to troubleshoot cookbook dependency solving problems by making modifications to the dependencies and version constraints declared in cookbook metadata. This isn't always easy or safe to do in the production Chef organization. The following instructions describe how to capture all relevant data from the production Chef organization and transfer it to a separate, empty Chef organization for troubleshooting.

Create an empty directory.

Create an empty directory and cd into it to simplify the capture of all relevant data.

mkdir cookbook_metadata
cd cookbook_metadata

Configure knife for capturing from the production Chef organization.

Create a knife.rb in the cookbook_metadata directory with the required parameters.

The following are the only required knife.rb parameters to use knife download to get all versions of all cookbooks from the Chef server. Be sure to set ORG_NAME to the name of the production Chef organization.

chef_server_url 'https://api.opscode.com/organizations/ORG_NAME'
node_name 'USER_NAME'
client_key '/path/to/USER_NAME.pem'
versioned_cookbooks true
knife[:chef_repo_path] = Dir.pwd

Download metadata files from all versions of all cookbooks

The following knife download command will get the metadata.rb and metadata.json files of all versions of all cookbooks from the Chef server. Both files are needed because some tools upload metadata.rb and other tools upload metadata.json.

Run the following command while in the cookbook_metadata directory so knife uses the custom knife.rb.

knife download cookbooks/*/metadata.{rb,json}

Capture a list of all versions of all cookbooks.

If some cookbooks have the same name but different cased letters and the workstation used to download the cookbook files runs Windows or OS X or some other operating system that is not case sensitive when handling file names then some cookbooks won't get captured when running knife download cookbooks/*/metadata.{rb,json}.

To be aware of this run the following command to capture a full list of all versions of all cookbooks for visual comparison against the downloaded cookbooks.

knife cookbook list -a > cookbook_list.json

Capture relevant information from the node.

Use the cookbook-versions.rb script to capture relevant information from the node that is failing to solve cookbook dependencies. Some of the information captured includes the environment, environment cookbook version constraints and its expanded run_list.

The expanded run_list has replaced any roles with their corresponding recipes. This information is helpful when replicating the run_list for troubleshooting because roles don't have to be created. The expanded run_list will include any cookbook version constraint specified directly in a run_list.

knife exec cookbook-versions.rb NODE_NAME > cookbook_versions.json

Prepare separate Chef organization for troubleshooting.

Use a knife.rb similar to the following to manage the separate Chef organization. Be sure to set ORG_NAME to the name of the troubleshooting Chef organization.

chef_server_url 'https://api.opscode.com/organizations/ORG_NAME'
node_name 'USER_NAME'
client_key '/path/to/USER_NAME.pem'
versioned_cookbooks true
knife[:chef_repo_path] = Dir.pwd

Delete all cookbooks from the troubleshooting Chef organization.

Be sure the Chef organization used for troubleshooting doesn't have any cookbooks in it. Any existing cookbooks can interfere with the troubleshooting process.

knife cookbook bulk delete '.*'

Prepare cookbook metadata for upload.

The cookbooks' metadata.rb files might be using IO.read (or other methods) to populate parameters such as version and description. These will cause the upload to fail since those files don't exist in the captured data so they need to be replaced or deleted.

# Replace all `version` statements with a correct version gleaned from the cookbook's directory name.
ruby -pi -e 'gsub(/^version.*/, "version \"#{$1}\"") if ARGF.filename =~ /-(\d+\.\d+\.\d+)\//' cookbooks/*/metadata.rb

# Delete any remaining `IO.read` statements.
sed -i '/IO.read/d' cookbooks/*/metadata.rb

Upload all versions of all cookbooks to the Chef organization.

Run the following command to upload all cookbooks to the Chef organization.

knife upload cookbooks

Create test environment with appropriate cookbook version constraints.

Use the following knife exec script to create an environment named 'troubleshoot-example-env' with the same constraints that are in the original environment.

knife exec -E 'env = Chef::Environment.new; env.name "troubleshoot-example-env"; env.cookbook_versions JSON.parse(IO.read("cookbook_versions.json"))["environment_cookbook_versions"]; env.save'

Create a test node and configure its run_list and environment.

Create a node named 'troubleshoot-example-node' and set its run_list to the original node's expanded run_list.

knife exec -E 'n = Chef::Node.new; n.name "troubleshoot-example-node"; n.run_list JSON.parse(IO.read("cookbook_versions.json"))["expanded_run_list"]; n.save'

Set the node's environment to the new 'troubleshoot-example-env' environment.

knife exec -E 'n = nodes.show("troubleshoot-example-node"); n.chef_environment("troubleshoot-example-node"); n.save'

Iteratively test cookbook dependency solving

At this point you can begin making modifications to the cookbook version constraints in the node's run_list, the Chef environment and/or the cookbook metadata and check the results using either the cookbook-versions.rb script or the knife-solve plugin.

For example:

knife exec cookbook-versions.rb troubleshoot-example-node
Have more questions? Submit a request

Comments

Powered by Zendesk