Summary
During the operation of Chef Backend HA cluster with elasticsearch enabled you may observe periodic high load and/or JVM memory usage. If you're chef cluster has in excess of 20,000 nodes then it is plausible that under certain 'perfect storm' scenarios failover can be triggered due to a combination of resource availability and lack of system tuning.
On a leader node, a notable component of these failover are noticeably high GC (garbage collection) times:
2020-08-10_17:40:35.72919 [2020-08-10T12:40:35,728][WARN ][o.e.m.j.JvmGcMonitorService] [42e6ba4893112e1f78fa5c07298795de] [gc][720] overhead, spent [3.1s] collecting in the last [4s]
2020-09-13_00:38:49.14630 [2020-09-12T19:38:48,986][WARN ][o.e.m.j.JvmGcMonitorService] [42e6ba4893112e1f78fa5c07298795de] [gc][young][428726][241166] duration [40.9s], collections [1]/[41.4s], t otal [40.9s]/[3.8h], memory [17.4gb]->[16.4gb]/[24.8gb], all_pools {[young] [1gb]->[12.5mb]/[1gb]}{[survivor] [120.6mb]->[78.4mb]/[133.7mb]}{[old] [16.3gb]->[16.3gb]/[23.6gb]} 2020-09-13_00:38:49.14890 [2020-09-12T19:38:48,986][WARN ][o.e.m.j.JvmGcMonitorService] [42e6ba4893112e1f78fa5c07298795de] [gc][428726] overhead, spent [40.9s] collecting in the last [41.4s]
These can cascade upwards to reach significantly higher than acceptable durations, resulting in pauses which are undetectable to the leaderl healthchecks.
These issues arise due to the configuration scripts for HA BE not ergonomically configuring new_size/young garbage collection pools to be in reasonable ratio with the tenured/old GC pool and heapsize. See https://docs.oracle.com/javase/8/docs/technotes/guides/vm/gctuning/sizing.html for specifics on these aspects of JVM.
Distribution
Product | Version | Topology |
Chef Backend | 2.x + | Cluster |
Process
Plan
Preparation: N/A
Design: N/A
Configure
Evaluation: N/A
Application: N/A
Troubleshoot
Analysis:
you should review any hard coded settings under `/etc/chef-backend/chef-backend.rb`. It is plausible that prior tuning will need to be factored into tuning going forwards. Look for anything associated to the JVM on which elasticsearch depends:
elasticsearch.enable = true
elasticsearch.heap_size = 248
elasticsearch.java_opts = ''
elasticsearch.new_size = 32
(See https://docs.chef.io/ctl_chef_backend/#example-output for a full list)
The `heap_size and `new_size` are specified in mb. If you observe only `heap_size` configured, especially if it has been previously configured to a sizeable amount (anything about 8000mb) and do not see any `new_size ` configured this would be concerning.
To understanding the actual JVM configuration at runtime you an use the following command:
java -Xms28G -Xmx28G -XX:+UseConcMarkSweepGC -XX:+UnlockDiagnosticVMOptions -XX:+PrintFlagsFinal -version |
egrep -i "( NewSize | OldSize | NewRatio | ParallelGCThreads )"
which should yield something equivalent to:
uintx NewRatio = 2 {product}
uintx NewSize := 2006515712 {product}
uintx OldSize := 28058255360 {product}
uintx ParallelGCThreads = 23 {product}
java version "1.8.0_162"
Java(TM) SE Runtime Environment (build 1.8.0_162-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.162-b12, mixed mode)
Note the ratio of NewSize to OldSize, which in this example is 14:1. This will cause GC to collect frequently. Should the host be running close to consuming maximum resources these collections can ended up queued and pause elasticsearch.
Remediation:
Before proceeding father ensure you are capturing the respective metrics (CPU, JVM, heapsize usage, memory usage) in monitoring for your deployment. If any of these are missing please configure them now. if you can capture a performance watermark at this point it can be used to ascertain whether the change has had a positive performance impact.
If in /etc/chef-backend/chef-backend.rb there was a `heap_size` set such as 12000mb we would encourage adding an accompanying `new_size` in a sensible ratio (3:1 is a good place to start). This would look as below in the example.
elasticsearch.new_size = 4000
elasticsearch.heap_size = 12000
then run:
chef-server-ctl reconfigure
you can post check the configuration by also using the same command as mentioned in Analysis which should show an updated new_size.
NOTE: the heapsize you have configured must also be passed to the command '-Xms' and '-Xmx' falls as per this example
java -Xms12G -Xmx12G -XX:+UseConcMarkSweepGC -XX:+UnlockDiagnosticVMOptions -XX:+PrintFlagsFinal -version |
egrep -i "( NewSize | OldSize | NewRatio | ParallelGCThreads )"
Using the performance watermark you established earlier you should now be able to observe whether the cluster has benefited from the change or if it needs further tuning. You. should see lower and more consistent overall CPU usage as a result, but depending on your load/usage this may consequentially affect other aspects positively.
Appendix
Related Articles:
Chef-Backend Cluster: Chef Server Frontend/Backend Tuning
Further Reading:
https://docs.oracle.com/javase/8/docs/technotes/guides/vm/gctuning/sizing.html
https://www.baeldung.com/jvm-parameters
Comments
0 comments
Please sign in to leave a comment.