In addition to general guidelines on writing helpful support tickets, oftentimes, when dealing specifically with outages of complicated, multifaceted production systems which may concern several teams at your organization, direct application troubleshooting such as being able to accurately describe a symptom and provide logs to Chef Support, while appreciated, will not be enough to work through most production outages. As the Severity 1 requester, it is worth considering and being able to address the following, which Chef Support engineers will inquire after:
Severity Definitions
https://www.chef.io/service-level-agreement
Personnel
- Do you have full administrator access to the Chef Environment? Do you have full administrator access to the infrastructure within which the Chef Environment resides?
- If not, who else from your organization needs to be brought on to work the issue?
- If applicable, have platform vendors been engaged? (e.g., VMware, AWS, Azure, Cisco)
History
- Has this issue, or something very similar, happened before?
- If so, did you raise a ticket with us? Can you reference it?
- Do you have metrics / monitoring in place that can describe conditions leading up to the outage?
Environment
- Do you have current backups? Do you know your backup schedule?
- What is the status of your underlying network?
- Can all of your machines ping each other?
- Were there any recent network operations center alerts?
- Are your systems baremetal or virtualized?
- If they are virtualized, is there resource contention at that layer?
- Are resources dynamically allocated?
- If they are virtualized, is there resource contention at that layer?
- What is the status of your underlying storage?
- Do you depend on an external authentication system?
- Do you have recent, accurate diagrams/documentation of your deployment and supporting infrastructure?
Circumstances
- Have you recently upgraded any Chef components?
- Have any of the systems undergone recent patching?
- Are your application teams releasing? Are there deployments happening?
- Have you introduced any significant changes to your cookbooks?
- Are you aware of any recent scheduled or unscheduled maintenance?
- Is there any planned testing happening on underlying infrastructure?
- Do you have any third-party applications that interact with any Chef API?
Being able to speak to the matters above, or have access to others who can from your organization, will be critical in order to resolve your outage.
Comments
0 comments
Article is closed for comments.