Nginx Source Code Notes – URI Matching

Nginx Source Code Notes – URI Matching

In the Nginx configuration file, server {}multiple virtual hosts can be configured within a single virtual host location {}locationWhat is their purpose?

location

This directive allows different configurations depending on the URI. It can be configured using both literal strings and regular expressions.

locationThis involves differentiating data at different URIlayers HTTP requestand processing it accordingly. For example, some data might URIreturn static content, while others URImight be distributed to backend application servers and returned with dynamic content generated by those servers.

In summary, locationit achieves HTTP requestthe subdivision of processing.

locationmatch

If a request server {}can be configured with multiple locationoptions, how do locationyou select the appropriate option from among them location?

location order

The order in which locationdirectives are checked as follows:

  1. Directives with the “=” prefix that match the query exactly (literal string). If found, searching stops.
  2. All remaining directives with conventional strings. If this match used the `^~” prefix, searching stops.
  3. Regular expressions, in the order they are defined in the configuration file.
  4. If #3 yielded a match, that result is used. Otherwise, the match from #2 is used.

To determine which locationdirective matches a particular query, the literal strings are checked first. Literal strings match the beginning portion of the query – the most specific match will be used. Afterwards, regular expressions are checked in the order defined in the configuration file. The first regular expression to match the query will stop the search. If no regular expression matches are found, the result from the literal search is used.

In summary:

  • The order of regular locationexpressions in the configuration file does not affect the matching order. However, regular expressions locationwill be matched in the order they are defined in the configuration file.
  • If set to exact match (with =prefix) , this is used immediately locationif the request URI is matched, and the matching process ends.location
  • Among other strings containing only regular characters location, find the longest match to the request URI. If this string server {}does not contain regular expressions locationor if `regex.regex` locationis enabled ^~, this longest match locationwill be used. If this server {}string contains regular expressions location, first match among these regular expressions location; if a match is found, use the matching regular expression location; otherwise, use the longest match location.
  • ^~These modifiers , =~~*do not need to be separated from the following URI string by spaces. @ Modifiers must be directly concatenated with the URI string (actually, it should be called “naming”).
  • locationNested definitions are allowed. However, they must conform to the following rules:
    • =The modified locationtext cannot contain other elements.location
    • @The modified locationtext cannot contain other elements.location
    • @Modifications locationcannot be nested within other elements location.
    • The parent location‘s URI string must be locationa prefix of the child’s URI string ( locationunless the child has regular expressions enabled).

The following sections will analyze how Nginx organizes storage locationand how the matching rules mentioned above are implemented.

Storage structure

locationThe configuration parsing of commands has been briefly described in the article “Configuration Parsing “. Let’s first introduce the main data types involved in this part.

  • ngx_http_core_loc_conf_t– locationScope structure. This structure is also responsible for storing locationconfiguration items that can be defined in multiple scopes simultaneously in other higher-level scopes, and for connecting higher-level scopes and locationother scopes.
struct  ngx_http_core_loc_conf_s  { 
    ngx_str_t            name ;    /* URI part string */ 
    ngx_http_regex_t     * regex ;  /* Regular expression object compiled by the regular expression engine */ 
    ... 
    unsigned             named : 1 ;         /* @ modifier */ 
    unsigned             noname : 1 ;        /* if () {} */ 
    unsigned             exact_match : 1 ;   /* = modifier */ 
    unsigned             noregex : 1 ;       /* ^= modifier */

    ... 
    ngx_http_location_tree_node_t    * static_location ; 
    ngx_http_core_loc_conf_t         ** regex_location ; 
    void                 ** loc_conf ; 
    ... 
    ngx_queue_t          * locations ;  /* Connects the `location` scope, 
                                       forcibly cast from ngx_http_location_queue_t 
                                       */ 
} ;

ngx_http_location_queue_t– A packaging structure used for temporary storage locationngx_queue_tIt serves both as a storage device queue headand as a functional component queue node.

typedef struct {
    ngx_queue_t queue;
    ngx_http_core_loc_conf_t *exact; /* exact_match, regex, named, noname */
    ngx_http_core_loc_conf_t *inclusive; /* non-exact location */
    ngx_str_t *name;
    u_char *file_name;
    ngx_uint_t *line;
    ngx_queue_t list;
} ngx_http_location_queue_t;

ngx_http_location_tree_node_tlocation– A tree-like storage structure of nodes at runtime . Facilitates locationfast lookup.

struct ngx_http_location_tree_node_s {
    ngx_http_location_tree_node_t *left;
    ngx_http_location_tree_node_t *right;
    ngx_http_location_tree_node_t *tree;

    ngx_http_core_loc_conf_t *exact;
    ngx_http_core_loc_conf_t *inclusive;

    u_char auto_redirect;
    u_char len;
    u_char name[1];
};

After the initial configuration parsing is completed, server {} locationthe temporary storage structure of an instruction can be found in the diagram in the configuration parsing .

ngx_http_blockAfter completing the configuration reading and storage, the functions `read` ngx_init_locationsand ` reorganize` are called to reorganize the configuration defined in ngx_http_init_static_location_treeeach virtual host .server {}location

    ------------http/ngx_http.c:114------------
    static char *
    ngx_http_block(ngx_conf_t *cf, ngx_combined_t *cmd, void *conf)
    {
        ...
        cmcf = ctx->main_conf[ngx_http_core_module.ctx_index];
        cscfp = cmcf->servers.elts;
        ...
        for (s = 0; s < cmcf->servers.nelts; s++) {
            clcf = cscfp[s]->ctx->loc_conf[ngx_http_core_module.ctx_index];

            if (ngx_http_init_locations(cf, cscfp[s], clcf) != NGX_OK) {
                return NGX_CONF_ERROR;
            }

            if (ngx_http_init_static_location_trees(cf, clcf) != NGX_OK) {
                return NGX_CONF_ERROR;
            }
        }
        ...
    }

Additional notes on the above code:

  • clcfUltimately, it points to a structure used to connect each element to the elements defined within it (see Configuration server {}Resolution ) .locationngx_http_loc_conf_t

This function ngx_http_init_locationsis responsible for locationsorting and storing the data in categories. After processing, locationsthe queue only contains data of type ` Integer` exactand `Integer` . Then, the `Integer` and ` Integer` data types are further processed to construct a more easily accessible data structure.inclusivelocationngx_http_init_static_location_treesexactinclusivelocation

Let’s first take a look ngx_http_init_locationsat how locationthe categories are stored.

    ----------http/ngx_http.c:626-------------
    static ngx_int_t
    ngx_http_init_locations(ngx_conf_t *cf, ngx_http_core_srv_conf_t *cscf,
        ngx_http_core_loc_conf_t *pclcf)
    {
        ...
        locations = pclcf->locations;
        ...
        ngx_queue_sort(locations, ngx_http_cmp_locations);
        named = NULL;
        n = 0;
        regex = NULL;
        r = 0;

        for (q = ngx_queue_head(locations);
            q != ngx_queue_sentinel(locations);
            q = ngx_queue_next(q))
        {
            clcf = lq->exact ? lq->exact : lq->inclusive;

            if (ngx_http_init_locations(cf, NULL, clcf) != NGX_OK) {
                return NGX_ERROR;
            }

            if (clcf->regex) {
                r++;
                if (regex == NULL) {
                    regex = q;
                }

                continue?
            }

            if (clcf->named) {
                n++

                if (named == NULL) {
                    named = q;
                }

                continue?
            }

            if (clcf->noname) {
                break
            }
        }

        if (q != ngx_queue_sentinel(locations)) {
            ngx_queue_split(locations, q, &tail);
        }

        if (named) {
            clcfp = ngx_palloc(cf->pool,
                               (n + 1) * sizeof(ngx_http_core_loc_conf_t **));
            ...
            cscf->named_locations = clcfp;

            for (q = named;
                 q != ngx_queue_sentinel(locations);
                 q = ngx_queue_next(q))
            {
                lq = (ngx_http_location_queue_t *) q;
                *(clcfp++) = lq->exact;
            }

            *clcfp = NULL;

            ngx_queue_split(locations, named, &tail);
        }

        if (regex) {
            ...
            pclcf->regex_locations = clcfp;
            ...
        }

        return NGX_OK;
    }

Additional notes on the code above:

  • ngx_queue_sortSort all locationby type. After the operation, locationthe order of the categories is as follows ( inclusivemeaning that if locationa configuration URIappears request URIbefore another category, it is considered locationa match):
exact(sorted) -> inclusive(sorted) -> regex -> named -> noname
  • ngx_http_init_locationsIt calls itself recursively. This is to handle locationnested cases.
  • nonameThe type locationis discarded.
  • namedThe type locationis added to the array pointed to by ngx_http_srv_conf_tthe member . Since the type cannot be nested within other types or be nested within other types , it is stored in a struct.named_locationsnamedlocationlocationlocationngx_http_srv_cont_t
  • regexThe type locationwas added to the array pointed to by the associated ngx_http_loc_conf_tmember regex_locations.
  • exactinclusiveTypes and locationremain in locationsthe queue. They both belong to static location.

Next, the type exactand are further processed by the function:inclusivelocationngx_http_init_static_location_trees

    ---------------http/ngx_http.c:755---------------
    static ngx_int_t
    ngx_http_init_static_location_trees(ngx_conf_t *cf,
        ngx_http_core_loc_conf_t *pclcf)
    {
        locations = pclcf->locations;
        ...
        for (q = ngx_queue_head(locations);
             q != ngx_queue_sentinal(locations);
             q = ngx_queue_next(q))
        {
            lq = (ngx_http_location_queue_t *) q;

            clcf = lq->exact ? lq->exact : lq->inclusive;

            ngx_http_init_static_location_trees(cf, clcf);
        }

        ngx_http_join_exact_locations(cf, locations);

        ngx_http_create_locations_list(locations, ngx_queue_head(locations));

        pclcf->static_locations = ngx_http_create_locations_tree(cf, locations, 0);
        ...
    }

Additional notes on the above code:

  • ngx_http_init_static_location_treesFunctions can be nested, in order to handle location nested definitions.
  • ngx_http_join_exact_locationsMerge URI strings that are exactly the same in the current virtual host exact with inclusivestrings of type .location
  • ngx_http_create_locations_list`and` ngx_http_create_locations_treeare used to construct relatively balanced binary search trees. Detailed implementation code will be analyzed below.

Since inclusivethe types have already been sorted lexicographically within the function, locationtypes with a common prefix will be stored together. If a type’s prefix is ​​a prefix of one or more subsequent types , then these types of nodes will be merged and consolidated.ngx_http_init_locationsinclusivelocationlocationURIlocation URIlocationngx_http_create_locations_list

To make the analysis more concrete, the code is illustrated below using examples locations(only URIthe node nameparts are listed) ./a /ab /abc /abd /b /bc

exactNodes of this type remain in locationsthe queue.

    ---------------http/ngx_http.c:959--------------------------
    static void
    ngx_http_create_locations_list(ngx_queue_t *locations, ngx_queue_t *q)
    {
        if (q == ngx_queue_last(locations)) {
            return;
        }

        lq = (ngx_http_location_queue_t *) q;

        if (lq->inclusive == NULL) {
            ngx_http_create_locations_list(locations, ngx_queue_next(q));
            return;
        }

        len = lq->name->len;
        name = lq->name->data;

        for (x = ngx_queue_next(q);
             x != ngx_queue_sentinel(locations);
             x = ngx_queue_next(x))
        {
            lx = (ngx_http_location_queue_t *) x;

            if (len > lx->name->len
                || (ngx_strncmp(name, lx->name->data, len) != 0))
            {
                break
            }
        }
        ...
        ngx_queue_split(locations, q, &tail);
        ngx_queue_add(&lq->list, &tail);
        ...
        ngx_queue_split(&lq->list, x, &tail);
        ngx_queue_add(locations, &tail);

        ngx_http_create_locations_list(&lq->list, ngx_queue_head(&lq->list));

        ngx_http_create_locations_list(locations, x);
    }

Additional notes on the code above:

  • Some boundary operations have been omitted from the excerpt above.
  • The parameter locationspoints to the first-level locationnode (nodes at other levels locationhave been appended to the variables locationof the parent node list), and points to the nodes in qthe list locationsthat have not yet been organized location.
  • Because this function involves recursive calls, q == ngx_queue_last(locations)it serves as the exit condition for the recursion.
  • For nodes that are not of inclusivetype (at this time locations, the queue only contains nodes of type exactand inclusive type locationlocation, skip them directly without any organization.
if (lq->inclusive == NULL) {
    ngx_http_create_locations_list(locations, ngx_queue_next(q));
    return;
}
  • lqPoints to the first node to be processed locationlxPoints to the first node URIthat is not prefixed with lqthe node it points to URI. For example, in the example above locations, if lqit points to /a, then lxit will eventually point to /b. Therefore, /a /ab /abc /abdit is the node that can be processed and merged into the first-level node /a.
  • Nodes that can be merged are appended to the node’s variable using the ngx_queue_split`add` and ` ngx_queue_addreplace` operations . The remaining nodes starting from the current node are returned to the queue using the `add` and ` replace` operations ./alistlxngx_queue_listngx_queue_addlocations
  • Finally, perform the same operation on lq->listthe nodes in the array and the remaining nodes starting from the array.lx
  • ngx_http_create_locations_listAfter the function completes, the example locations‘s structure is as follows:

ngx_http_create_location_listBy merging a locationgiven node and one or more nodes that URIfollow it with a prefix of , and then converting it into a binary search tree ( where nodes are arranged in an ordered manner), the speed of searching and matching for a given can be accelerated .URIngx_http_create_location_treeslocationrequest URIlocation

The construction of the binary search tree ngx_http_create_location_treesis completed in the function.

    ----------http/ngx_http.c:1023------------------
    static ngx_http_location_tree_node_t *
    ngx_http_create_locations_tree(ngx_conf_t *cf, ngx_queue_t *locations)
    {
        ...
        q = ngx_queue_middle(location);

        lq = (ngx_http_location_queue_t *) q;
        len = lq->name->len - prefix;

        node = ngx_palloc(cf->pool,
                          offsetof(ngx_http_location_tree_node_t, name) + len);
        ...
        node->len = (u_char) len;
        ngx_memcpy(node->name, &lq->name->data[prefix], len);

        ngx_queue_split(locations, q, &tail);
        ...
        node->left = ngx_http_create_locations_tree(cf, locations, prefix);
        ...
        ngx_queue_remove(q);
        ...
        node->right = ngx_http_create_locations_tree(cf, &tail, prefix);
        ...
    inclusive:

        if (ngx_queue_empty(&lq->list)) {
            return node;
        }

        node->tree = ngx_http_create_locations_tree(cf, &lq->list, prefix + len);
        ...
        return node;
    }

Additional notes on the above code:

  • Some boundary operations have been omitted from the excerpt above.
  • It’s worth noting that ngx_queue_middlethe function uses fast and slow pointers, determining the position of the middle node in a single traversal. Although there’s no increase in complexity, the technique is excellent. queueWhen the number of middle nodes is odd, the function returns the node at the exact middle position; when the number of nodes is even, the function returns the first node of the latter half.
  • locationsIn the queue, qthe elements before the middle node become qthe left subtree nodes of the queue; q the elements after the middle node become qthe right subtree nodes of the queue. Then, the process of building left and right subtrees is repeated for these two groups of nodes.
  • Then, a binary search tree is recursively constructed from the nodes merged by the function contained in the list q. It’s important to note that in the generated hierarchical linked list, the parent node’s is always a prefix of the child node’s . This way, the child node doesn’t need to store the complete , it only needs to store the portion exceeding the parent node’s . This saves space and improves search efficiency (prefixes that have already been compared don’t need to be compared again when searching the child node).listngx_http_create_locations_listngx_http_create_locations_listURIURIURI
  • qThe binary search tree constructed for the lower-level nodes of node->treeis recorded by the member variable.
  • Ultimately, locationsall queue nodes were transformed into binary tree nodes. Nodes in the temporary storage structure were also transformed into ngx_http_location_queue_tnodes in the structure members.listngx_http_location_tree_node_ttree
  • locationsAfter calling the queue in the example above ngx_http_create_locations_tree, the structure diagram is as follows:

At this point, locationthe relevant data structures are complete. The remaining task is to determine locationthe scope corresponding to each request.

Request matching

Before starting to determine requestwhich process locationshould be handled, Nginx has already determined requestthe virtual host to which it belongs.

The matching of requests is completed in the corresponding checker function locationof the request processing FIND_CONFIGphase (see the module category for module introduction) . The function call performs the actual matching work.handlerngx_http_core_find_config_phasengx_http_core_find_config_phasengx_http_core_find_location

    ----------http/ngx_http_core_module.c:1427----------
    static ngx_int_t
    ngx_http_core_find_location(ngx_http_request_t *r)
    {
        ...
        pclcf = ngx_http_get_module_loc_conf(r, ngx_http_core_module);

        rc = ngx_http_core_find_static_location(r, pclcf->static_locations);

        if (rc == NGX_AGAIN) {
            ...
            rc = ngx_http_core_find_location(r);
        }

        if (rc == NGX_OK || rc == NGX_DONE) {
            return rc;
        }

        /* rc == NGX_DECLINED or rc == NGX_AGAIN in nested location */

        if (noregex == 0 && pclcf->regex_locations) {

            for (clcfp = pclcf->regex_locations; *clcfp; clcfp++) {
                ...
                n = ngx_http_regex_exec(r, (*clcfp)->regex, &r->uri);
                ...
            }
        }

        return rc;
    }

Additional notes on the code above:

  • ngx_http_core_find_static_locationThe function’s return value determines whether to retry the regular expression search. The allowed return values ​​are as follows:
    • NGX_OK– exact match
    • NGX_DONE– auto redirect
    • NGX_AGAIN– inclusive match
    • NGX_DECLINED– no match
  • When the return value is NGX_AGAIN`true`, that is, inclusive match`false`, according to the pre-defined logic, the next step is to perform regular expression matching. However, due to the locationexistence of nested `find` statements, it is necessary to decide whether to start regular expression matching only after completing the nested search.
  • However, I have a question regarding nested locationsearches : If a match of type ` <int>` is found, the first function call returns a value . Next, it checks for nested matches . If there are no nested matches, the recursive calls to the function used to detect nesting return . Furthermore, if no parallel regular expression calls are set, the entire search process returns . This seems inconsistent with the definition of a function’s return value.inclusivelocation Angx_http_core_find_static_locationNGX_AGAINAlocatonAngx_http_core_find_locationNGX_DECLINEDANGX_DECLINEDngx_http_core_find_static_location

The code logic for the function ngx_http_core_find_locationis closely related to the tree structure in the previous section. The main logic will not be analyzed further. The only point to note is that the type checking logic 0.8.34in the code I used does not match the description in the current Nginx documentation:exactlocation

    if (len == (size_t) node->len) {
        r->loc_conf = (node->exact) ? node->exact->loc_conf:
                                      node->inclusive->loc_conf;
        return NGX_OK;
    }

Based on the function’s return value definition and what we see in the code, Nginx considers an exact match to have occurred, regardless of locationwhether the type is true or false .exactexact match

Fortunately, in newer versions 1.4.1, this code has been changed to:

    if (len == (size_t) node->len) {

        if (node->exact) {
            r->loc_conf = node->exact->loc_conf;
            return NGX_OK;

        } else {
            r->loc_conf = node->inclusive->loc_conf;
            return NGX_AGAIN;
        }
    }

Furthermore, CHANGESthe version number and time of this behavior change were found in the file:

1353 Changes with nginx 0.8.42 21 Jun 2010 1354 1355 *) Change: now nginx tests locations given by regular expressions, if 1356 request was matched exactly by a location given by a prefix string. 1357 The previous behavior has been introduced in 0.7.1.

Don’t leave me so easily, please leave something behind…


Comments

Leave a Reply

Your email address will not be published. Required fields are marked *