keealived-vrrp_script
weight vs. priority
Let’s first analyze how the priority value of vrrp_instance is calculated from the code:
709 /* Update VRRP effective priority based on multiple checkers.
710 * This is a thread which is executed every adver_int.
711 */
712 static int
713 vrrp_update_priority(thread_t * thread)
714 {
715 vrrp_rt *vrrp = THREAD_ARG(thread);
716 int prio_offset, new_prio;
717
718 /* compute prio_offset right here */
719 prio_offset = 0;
720
721 /* Now we will sum the weights of all interfaces which are tracked. */
722 if ((!vrrp->sync || vrrp->sync->global_tracking) && !LIST_ISEMPTY(vrrp->track_ifp))
723 prio_offset += vrrp_tracked_weight(vrrp->track_ifp);
724
725 /* Now we will sum the weights of all scripts which are tracked. */
726 if ((!vrrp->sync || vrrp->sync->global_tracking) && !LIST_ISEMPTY(vrrp->track_script))
727 prio_offset += vrrp_script_weight(vrrp->track_script);
728
729 if (vrrp->base_priority == VRRP_PRIO_OWNER) {
730 /* we will not run a PRIO_OWNER into a non-PRIO_OWNER */
731 vrrp->effective_priority = VRRP_PRIO_OWNER;
732 } else {
733 /* WARNING! we must compute new_prio on a signed int in order
734 to detect overflows and avoid wrapping. */
735 new_prio = vrrp->base_priority + prio_offset;
736 if (new_prio < 1)
737 new_prio = 1;
738 else if (new_prio > 254)
739 new_prio = 254;
740 vrrp->effective_priority = new_prio;
741 }
742
743 /* Register next priority update thread */
744 thread_add_timer(master, vrrp_update_priority, vrrp, vrrp->adver_int);
745 return 0;
746 }
As seen in the code above, the priority of each vrrp_instance is calculated by a thread. The final priority value is obtained by adding the configured value (vrrp->base_priority) to the sum of the weight values of all scripts. The final value is controlled to be within the range of 1-254.
Let’s take a look at how the “sum of the weight values of all scripts” is calculated:
209 /* Returns total weights of all tracked scripts :
210 * - a positive weight adds to the global weight when the result is OK
211 * - a negative weight subtracts from the global weight when the result is bad
212 *
213 */
214 int
215 vrrp_script_weight(list l)
216 {
217 element e;
218 tracked_sc *tsc;
219 int weight = 0;
220
221 for (e = LIST_HEAD(l); e; ELEMENT_NEXT(e)) {
222 tsc = ELEMENT_DATA(e);
223 if (tsc->scr->result == VRRP_SCRIPT_STATUS_DISABLED)
224 continue;
225 if (tsc->scr->result >= tsc->scr->rise) {
226 if (tsc->weight > 0)
227 weight += tsc->weight;
228 } else if (tsc->scr->result < tsc->scr->rise) {
229 if (tsc->weight < 0)
230 weight += tsc->weight;
231 }
232 }
233
234 return weight;
235 }
Wait, what does “result” mean?
result
989 static int
990 vrrp_script_child_thread(thread_t * thread)
991 {
....
1014 wait_status = THREAD_CHILD_STATUS(thread);
1015
1016 if (WIFEXITED(wait_status)) {
1017 int status;
1018 status = WEXITSTATUS(wait_status);
1019 if (status == 0) {
1020 /* success */
1021 if (vscript->result < vscript->rise - 1) {
1022 vscript->result++;
1023 } else {
1024 if (vscript->result < vscript->rise)
1025 log_message(LOG_INFO, "VRRP_Script(%s) succeeded", vscript->sname);
1026 vscript->result = vscript->rise + vscript->fall - 1;
1027 }
1028 } else {
1029 /* failure */
1030 if (vscript->result > vscript->rise) {
1031 vscript->result--;
1032 } else {
1033 if (vscript->result >= vscript->rise)
1034 log_message(LOG_INFO, "VRRP_Script(%s) failed", vscript->sname);
1035 vscript->result = 0;
1036 }
1037 }
1038 }
1039
1040 return 0;
1041 }
In the documentation, rise means that a vrrp_script is considered to be in a normal state only after rise successful connection checks. fall in the documentation has a similar meaning to rise; a vrrp_script is considered to be in an abnormal state only after fall failed connection checks.
Let’s look at a comment from vrrp_track.h:45:
/* VRRP script tracking results.
* The result is an integer between 0 and rise-1 to indicate a DOWN state,
* or between rise-1 and rise+fall-1 to indicate an UP state. Upon failure,
* we decrease result and set it to zero when we pass below rise. Upon
* success, we increase result and set it to rise+fall-1 when we pass above
* rise-1.
*/
rise rise+fall-1
+------------------++----------------+
0 DOWN rise-1 UP
The above explanation and diagram illustrate the range of changes in the result value and the corresponding vrrp_instance states.
The initial value of result is set in vrrp_init_script:
291 /* if run after vrrp_init_state(), it will be able to detect scripts that
292 * have been disabled because of a sync group and will avoid to start them.
293 */
294 static void
295 vrrp_init_script(list l)
296 {
297 vrrp_script *vscript;
298 element e;
299
300 for (e = LIST_HEAD(l); e; ELEMENT_NEXT(e)) {
301 vscript = ELEMENT_DATA(e);
302 if (vscript->inuse == 0)
303 vscript->result = VRRP_SCRIPT_STATUS_DISABLED;
304
305 if (vscript->result == VRRP_SCRIPT_STATUS_INIT) {
306 vscript->result = vscript->rise - 1; /* one success is enough */
307 thread_add_event(master, vrrp_script_thread, vscript, vscript->interval);
308 } else if (vscript->result == VRRP_SCRIPT_STATUS_INIT_GOOD) {
309 vscript->result = vscript->rise; /* one failure is enough */
310 thread_add_event(master, vrrp_script_thread, vscript, vscript->interval);
311 }
312 }
313 }
The initial value of inuse is 0. After being referenced in track_script, inuse++ changes its value to 1. Ultimately, the initial value of result is assigned rise-1 /* (failure bug) one success is enough */ when keepalived starts (STATUS_INIT); and rise /* (success but) one failure is enough */ when keepalived restarts (STATUS_INIT_GOOD). Combined with the code in vrrp_script_child_thread above, the first check can determine whether vrrp_instance is in a normal or abnormal state.
Back to the beginning: Now that we understand the relationship between result and the state of vrrp_script, let’s look back at the calculation process of the weight value in vrrp_script_weight during each check:
225 if (tsc->scr->result >= tsc->scr->rise) {
226 if (tsc->weight > 0)
227 weight += tsc->weight;
228 } else if (tsc->scr->result < tsc->scr->rise) {
229 if (tsc->weight < 0)
230 weight += tsc->weight;
231 }
Conclusion
If the vrrp_script is in a normal state (tsc->scr->result >= tsc->scr->rise), and the vrrp_script‘s own weight is positive, this value will be added to the sum of the script’s weights and ultimately added to the vrrp_instance‘s priority value. If the weight is negative, it will be ignored and will not affect the priority.
If the vrrp_script is in an abnormal state (tsc->scr->result < tsc->scr->rise), and the vrrp_script‘s own weight is negative, this value will be subtracted from the sum of the script’s weights, ultimately causing a decrease in the vrrp_instance‘s priority value. If the weight is positive, it will be ignored and will not affect the priority.
In the test example, the MASTER‘s priority configuration value is 100, and the SLAVE‘s priority configuration value is 99. Two vrrp_scripts, A and B, are set, each with a weight of 10.
Based on the analysis above, when the vrrp_script value is positive, if the script fails to detect a problem, its weight will not be increased in the priority list. However, when A is -10 and B is 10, according to the analysis above, the MASTER priority value should be 100. Theoretically, this shouldn’t trigger a master-slave switch, but the logs show the opposite.
Debugging keepalived revealed that the cause is that the SLAVE’s weight (99) plus its vrrp_script weight (10) results in a final SLAVE weight of 109, which is higher than (100 + 10 (B) – 10 (A)). This ultimately causes the MASTER state to switch.
In The End
It starts with one thing
I don’t know why
It doesn’t even matter how hard you try
Setting the weight in Keepalived’s vrrp_script is quite tricky. The analysis above concludes that when using Keepalived’s VRRP for master-slave failover, maintaining consistent settings on both sides and choosing an appropriate priority value are crucial.
In the Very Ending, a return value of 0 in the vrrp_script indicates a successful detection; other values are considered failures (verified in the code).
- When
weightis positive, it will be added to the priority if the script detects a success, but not if the detection fails.
Master failure: Switching will occur when master priority < slave priority + weight.
Master success: Master priority + weight > slave priority + weight; the master remains the master. - When
weightis negative, it does not affect the priority if the script detects a success, but will be reduced bypriority - abs(weight)if the detection fails.
Master failure: Switching between master and slave will occur when master priority – abs(weight) < slave priority.
Master success: Master priority > slave priority; the master remains the master.


Leave a Reply