In a previous post I whined that the A10 load balancers by default only need to see a single successful health check from a service in order to mark it as up. I argued that that’s not a good idea, especially if a service is bouncing up and down. The good news was that you could set a global default value for this so that all your new health checks will be created with a higher value (I suggest 5–10).
The bad news is that this might have some side effects, which is what I’m sharing today.
Change Behavior
Here’s the problem. When you change the global health monitor values, all health checks with the same current value for that parameter get changed to the new global value. That sounds reasonable enough, as all those health checks created with the default values get changed to the new default. But what if you had intentionally set a health check value to something that just happened to match the default? Well, they will get changed too.
Imagine this scenario:
Health Monitor Name | Current “Retry” Value |
---|---|
tcp8080 | 3 |
tcp80 | 3 |
tcp4114 | 5 |
tcp443 | 3 |
In this case, tcp 8080, tcp80 and tcp443 used the default value for the Retry parameter, but we intentionally set tcp4114 to retry 5 times. Now if we go to the Global Health Monitor settings and change the global setting for Retry to 5, it will change those monitors that just used the default of 3, to a new value of 5:
Health Monitor Name | Current “Retry” Value |
---|---|
tcp8080 | 5 |
tcp80 | 5 |
tcp4114 | 5 |
tcp443 | 5 |
So far so good. Let’s now assume that you changed the global by mistake and you go and undo your change, setting the global value for Retrt (currently 5) back to 3. This happens:
Health Monitor Name | Current “Retry” Value |
---|---|
tcp8080 | 3 |
tcp80 | 3 |
tcp4114 | 3 |
tcp443 | 3 |
Oops. My tcp4114 monitor appears to have changed to 3 retries even though I had changed it from the default and set it to 5! That’s quite annoying, isn’t it?
Why?
The reason for this behavior is pretty simple. If you look at the underlying configuration via the CLI, you see that there’s configuration for global, then each health monitor has its own configuration stored, even when it matches the global setting. For example:
health global interval 5 timeout 5 retry 3 up-retry 3
!
health monitor tcp80 interval 5 retry 3 timeout 5 up-retry 3
method tcp port 80 halfopen
!
health monitor tcp8080 interval 5 retry 3 timeout 2 up-retry 3
method tcp port 8080 halfopen
!
health monitor tcp4114 interval 5 retry 5 timeout 2 up-retry 3
method tcp port 4114 halfopen
!
health monitor tcp443 interval 5 retry 3 timeout 2 up-retry 3
method tcp port 443 halfopen
How does the A10 know which of those health monitors should use the global values, and which have the values set manually? Answer: it doesn’t. Thus when you change the global value, the only thing the A10 can do is to look at which health monitors have the same parameter value it used to have before your change, and change those to match. Thus after our change to Retry = 5, the config looks like this:
health global interval 5 timeout 5 retry 5 up-retry 3
!
health monitor tcp80 interval 5 retry 5 timeout 5 up-retry 3
method tcp port 80 halfopen
!
health monitor tcp8080 interval 5 retry 5 timeout 2 up-retry 3
method tcp port 8080 halfopen
!
health monitor tcp4114 interval 5 retry 5 timeout 2 up-retry 3
method tcp port 4114 halfopen
!
health monitor tcp443 interval 5 retry 5 timeout 2 up-retry 3
method tcp port 443 halfopen
When you change the global back to 3, the A10 looks for all monitors that match its current setting (5) and determines that all of the monitors match, so it changes all four of them to “retry 3”.
Workarounds
I haven’t found a workaround to this (what I believe is a) wacky behavior yet. From a coding perspective surely an absent parameter would mean to use the default? So why wouldn’t the configuration look like this before the change?
health global interval 5 timeout 5 retry 3 up-retry 3
!
health monitor tcp80 interval 5 timeout 5 up-retry 3
method tcp port 80 halfopen
!
health monitor tcp8080 interval 5 timeout 2 up-retry 3
method tcp port 8080 halfopen
!
health monitor tcp4114 interval 5 retry 5 timeout 2 up-retry 3
method tcp port 4114 halfopen
!
health monitor tcp443 interval 5 timeout 2 up-retry 3
method tcp port 443 halfopen
The missing “retry” parameter on three of the health monitors would just require ACOS to reference back to whatever the current global value is, and you’re done; Problem solved.
If you change the value manually, then the setting should be there, as in tcp4114.
What if you want to return to default settings? Just add a checkbox in the GUI next to each text box that says “Use Global Setting” – that would effectively remove the configured parameter and drop you back to the default (global) setting.
Maybe this will change in a future release. If A10 wants to comment, please fire away! Meanwhile just be wary of changing the globals on an existing A10 load balancer; changing those settings a few times might have very undesirable results if you didn’t build all your monitors using default values.
30 Blogs in 30 Days
This post is part of my participation in Etherealmind’s 30 Blogs in 30 Days challenge.
Leave a Reply