keealived-vrrp_script

keealived-vrrp_script

keealived-vrrp_script

weight vs. priority
Let’s first analyze how the priority value of vrrp_instance is calculated from the code:

  709 /* Update VRRP effective priority based on multiple checkers.
  710  * This is a thread which is executed every adver_int.
  711  */
  712 static int
  713 vrrp_update_priority(thread_t * thread)
  714 {
  715   vrrp_rt *vrrp = THREAD_ARG(thread);
  716   int prio_offset, new_prio;
  717 
  718   /* compute prio_offset right here */
  719   prio_offset = 0;
  720 
  721   /* Now we will sum the weights of all interfaces which are tracked. */
  722   if ((!vrrp->sync || vrrp->sync->global_tracking) && !LIST_ISEMPTY(vrrp->track_ifp))
  723        prio_offset += vrrp_tracked_weight(vrrp->track_ifp);
  724 
  725   /* Now we will sum the weights of all scripts which are tracked. */
  726   if ((!vrrp->sync || vrrp->sync->global_tracking) && !LIST_ISEMPTY(vrrp->track_script))
  727       prio_offset += vrrp_script_weight(vrrp->track_script);
  728 
  729   if (vrrp->base_priority == VRRP_PRIO_OWNER) {
  730       /* we will not run a PRIO_OWNER into a non-PRIO_OWNER */
  731       vrrp->effective_priority = VRRP_PRIO_OWNER;
  732   } else {
  733       /* WARNING! we must compute new_prio on a signed int in order
  734          to detect overflows and avoid wrapping. */
  735       new_prio = vrrp->base_priority + prio_offset;
  736       if (new_prio < 1)
  737           new_prio = 1;
  738       else if (new_prio > 254)
  739           new_prio = 254;
  740       vrrp->effective_priority = new_prio;
  741   }
  742 
  743   /* Register next priority update thread */
  744   thread_add_timer(master, vrrp_update_priority, vrrp, vrrp->adver_int);
  745   return 0;
  746 }

As seen in the code above, the priority of each vrrp_instance is calculated by a thread. The final priority value is obtained by adding the configured value (vrrp->base_priority) to the sum of the weight values ​​of all scripts. The final value is controlled to be within the range of 1-254.

Let’s take a look at how the “sum of the weight values ​​of all scripts” is calculated:

  209 /* Returns total weights of all tracked scripts :
  210  * - a positive weight adds to the global weight when the result is OK
  211  * - a negative weight subtracts from the global weight when the result is bad
  212  *
  213  */
  214 int
  215 vrrp_script_weight(list l)
  216 {
  217   element e;
  218   tracked_sc *tsc;
  219   int weight = 0;
  220 
  221   for (e = LIST_HEAD(l); e; ELEMENT_NEXT(e)) {
  222       tsc = ELEMENT_DATA(e);
  223       if (tsc->scr->result == VRRP_SCRIPT_STATUS_DISABLED)
  224           continue;
  225       if (tsc->scr->result >= tsc->scr->rise) {
  226           if (tsc->weight > 0)
  227               weight += tsc->weight;
  228       } else if (tsc->scr->result < tsc->scr->rise) {
  229           if (tsc->weight < 0)
  230               weight += tsc->weight;
  231       }
  232   }
  233 
  234   return weight;
  235 }

Wait, what does “result” mean?

result

  989 static int
  990 vrrp_script_child_thread(thread_t * thread)
  991 {
 ....
 1014   wait_status = THREAD_CHILD_STATUS(thread);
 1015 
 1016   if (WIFEXITED(wait_status)) {
 1017       int status;
 1018       status = WEXITSTATUS(wait_status);
 1019       if (status == 0) {
 1020           /* success */
 1021           if (vscript->result < vscript->rise - 1) {
 1022               vscript->result++;
 1023           } else {
 1024               if (vscript->result < vscript->rise)
 1025                   log_message(LOG_INFO, "VRRP_Script(%s) succeeded", vscript->sname);
 1026               vscript->result = vscript->rise + vscript->fall - 1;
 1027           }
 1028       } else {
 1029           /* failure */
 1030           if (vscript->result > vscript->rise) {
 1031               vscript->result--;
 1032           } else {
 1033               if (vscript->result >= vscript->rise)
 1034                   log_message(LOG_INFO, "VRRP_Script(%s) failed", vscript->sname);
 1035               vscript->result = 0;
 1036           }
 1037       }
 1038   }
 1039 
 1040   return 0;
 1041 }

In the documentation, rise means that a vrrp_script is considered to be in a normal state only after rise successful connection checks. fall in the documentation has a similar meaning to rise; a vrrp_script is considered to be in an abnormal state only after fall failed connection checks.

Let’s look at a comment from vrrp_track.h:45:

/* VRRP script tracking results.
 * The result is an integer between 0 and rise-1 to indicate a DOWN state,
 * or between rise-1 and rise+fall-1 to indicate an UP state. Upon failure,
 * we decrease result and set it to zero when we pass below rise. Upon
 * success, we increase result and set it to rise+fall-1 when we pass above
 * rise-1.
 */
                     rise             rise+fall-1
+------------------++----------------+
0       DOWN       rise-1  UP

The above explanation and diagram illustrate the range of changes in the result value and the corresponding vrrp_instance states.

The initial value of result is set in vrrp_init_script:

  291 /* if run after vrrp_init_state(), it will be able to detect scripts that
  292  * have been disabled because of a sync group and will avoid to start them.
  293  */
  294 static void
  295 vrrp_init_script(list l)
  296 {
  297   vrrp_script *vscript;
  298   element e;
  299 
  300   for (e = LIST_HEAD(l); e; ELEMENT_NEXT(e)) {
  301       vscript = ELEMENT_DATA(e);
  302       if (vscript->inuse == 0)
  303           vscript->result = VRRP_SCRIPT_STATUS_DISABLED;
  304 
  305       if (vscript->result == VRRP_SCRIPT_STATUS_INIT) {
  306           vscript->result = vscript->rise - 1; /* one success is enough */
  307           thread_add_event(master, vrrp_script_thread, vscript, vscript->interval);
  308       } else if (vscript->result == VRRP_SCRIPT_STATUS_INIT_GOOD) {
  309           vscript->result = vscript->rise; /* one failure is enough */
  310           thread_add_event(master, vrrp_script_thread, vscript, vscript->interval);
  311       }
  312   }
  313 }

The initial value of inuse is 0. After being referenced in track_script, inuse++ changes its value to 1. Ultimately, the initial value of result is assigned rise-1 /* (failure bug) one success is enough */ when keepalived starts (STATUS_INIT); and rise /* (success but) one failure is enough */ when keepalived restarts (STATUS_INIT_GOOD). Combined with the code in vrrp_script_child_thread above, the first check can determine whether vrrp_instance is in a normal or abnormal state.

Back to the beginning: Now that we understand the relationship between result and the state of vrrp_script, let’s look back at the calculation process of the weight value in vrrp_script_weight during each check:

  225       if (tsc->scr->result >= tsc->scr->rise) {
  226           if (tsc->weight > 0)
  227               weight += tsc->weight;
  228       } else if (tsc->scr->result < tsc->scr->rise) {
  229           if (tsc->weight < 0)
  230               weight += tsc->weight;
  231       }

Conclusion

If the vrrp_script is in a normal state (tsc->scr->result >= tsc->scr->rise), and the vrrp_script‘s own weight is positive, this value will be added to the sum of the script’s weights and ultimately added to the vrrp_instance‘s priority value. If the weight is negative, it will be ignored and will not affect the priority.

If the vrrp_script is in an abnormal state (tsc->scr->result < tsc->scr->rise), and the vrrp_script‘s own weight is negative, this value will be subtracted from the sum of the script’s weights, ultimately causing a decrease in the vrrp_instance‘s priority value. If the weight is positive, it will be ignored and will not affect the priority.

In the test example, the MASTER‘s priority configuration value is 100, and the SLAVE‘s priority configuration value is 99. Two vrrp_scripts, A and B, are set, each with a weight of 10.

Based on the analysis above, when the vrrp_script value is positive, if the script fails to detect a problem, its weight will not be increased in the priority list. However, when A is -10 and B is 10, according to the analysis above, the MASTER priority value should be 100. Theoretically, this shouldn’t trigger a master-slave switch, but the logs show the opposite.

Debugging keepalived revealed that the cause is that the SLAVE’s weight (99) plus its vrrp_script weight (10) results in a final SLAVE weight of 109, which is higher than (100 + 10 (B) – 10 (A)). This ultimately causes the MASTER state to switch.

In The End

It starts with one thing

I don’t know why

It doesn’t even matter how hard you try

Setting the weight in Keepalived’s vrrp_script is quite tricky. The analysis above concludes that when using Keepalived’s VRRP for master-slave failover, maintaining consistent settings on both sides and choosing an appropriate priority value are crucial.

In the Very Ending, a return value of 0 in the vrrp_script indicates a successful detection; other values ​​are considered failures (verified in the code).

  • When weight is positive, it will be added to the priority if the script detects a success, but not if the detection fails.

    Master failure: Switching will occur when master priority < slave priority + weight.

    Master success: Master priority + weight > slave priority + weight; the master remains the master.
  • When weight is negative, it does not affect the priority if the script detects a success, but will be reduced by priority - abs(weight) if the detection fails.

    Master failure: Switching between master and slave will occur when master priority – abs(weight) < slave priority.

    Master success: Master priority > slave priority; the master remains the master.

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *