Enabling gzip in Nginx causes garbled characters.

Enabling gzip in Nginx causes garbled characters.

Question

To optimize the backend structure, an nginx scheduling server was added in front of the business server layer. Both the business server and the scheduling server had gzip and caching enabled. However, some phones displayed garbled characters, and this affected a small number of phone models. The scenario of the garbled characters causing program errors seemed accidental. But after disabling the scheduling server’s caching, all the problems disappeared.

The initial suspicion was that the frontend cache was corrupted due to improper gzip settings. (Because the nginx documentation gzip_http_version mentions proxy cache corruption: “When HTTP version 1.0 is used, the Vary: Accept-Encodingheader is not set. As this can lead to proxy cache corruption, consider adding it with add_header.”)

Because I didn’t fully understand several gzip configuration commands (the documentation was terrible), I couldn’t pinpoint the root cause of the problem for a long time. However, through further study and repeated experimentation with these commands, I finally figured out the cause of the problem.

gzip_http_version

Truns gzip compression on or off depending on the HTTP request version. (it is on if the request HTTP version is larger than the setting value).

In the gzip code of nginx, it was found that nginx determines whether to enable compression for a response based on several conditions:

/* http/modules/ngx_http_gzip_filter_module.c: 261 */
if (!r->gzip_tested) {
    if (ngx_http_gzip_ok(r) != NGX_OK) {
        return ngx_http_next_header_filter(r);
    }

} else if (!r->gzip_ok) {
    return ngx_http_next_header_filter(r);
}

/* http/ngx_http_core_module.c: 1915 */
if (r->http_version < clcf->gzip_http_version) {
    return NGX_DECLINED;
}

As can be seen from the code above, if the version number of any request is less than the value set by gzip_http_version, nginx will not compress the request result.

Vary: Accept-Encoding

gzip_vary command:

Enable response header of “Vary: Accept-Encoding”.

So what exactly does Vary: Accept-Encoding do?

Looking at the source code, this directive doesn’t affect much logic during request and response processing. It only determines whether to add “Vary: Accept-Encoding” in the final packet header filter.

/* ngx_http_header_filter_module.c: 397 */
#if (NGX_HTTP_GZIP)
    if (r->gzip_vary) {
        if (clcf->gzip_vary) {
            len += sizeof("Vary: Accept-Encoding" CRLF) - 1;

        } else {
            r->gzip_vary = 0;
        }
    }
#endif

from RFC:

An HTTP/1.1 server SHOULD include a Vary header field with any cacheable response that is subject to server-driven negotiation. Doing so allows a cache to properly interpret future requests on that resource and informs the user agent about the presence of negotiation on that resource. […] A Vary field value consisting of a list of field-names signals that the representation selected for the response is based on a selection algorithm which considers ONLY the listed request-header field values ​​in selecting the most appropriate representation. A cache MAY assume that the The same selection will be made for future requests with the same values ​​for the listed field names, for the duration of time for which the response is fresh.

From Stack Overflow:

in other words, Vary: Accept-Encodingtells the browser that two cacheable responses of the same resource will be the same even if the Accept-Encoding request is different (“varies”).

GET /js/somefile.js HTTP/1.1
Accept-Encoding: gzip

HTTP/1.1 200 OK
Vary: Accept-Encoding
Content-Encoding: gzip

This means that you’ll get the same script, no matter if you request compression or not.

From Stack Overflow:

It informs the behavior of the server with respect to cacheing he representation of the requested resource. If a new request for a previously cached resource is received, it will be served from the cache unless the Accept-Encoding header of the new request is different from the previously cached representation, at which point the request will be treated as a new request and will not be served from cache.

**If you’re serving a compressed file from cache and the client doesn’t accept your compression mechanism they’ll get a page of junk, so it’s necessary. **

**If Gzipped version is in cache and a client does not accept GZIP, they’ll be served gobbledegook. **

From Stack Overflow:

It is allowing the cache to serve up different cached versions of the page depending on whether or not the browser requests GZIP encoding or not. The Vary header instructs the cache to store a different version of the page if there is any variation in the indicated header.

As things stand, there will be one (possibly compressed) copy of the page in cache. Say it is the compressed version: if somebody requested the resource but does not support gzip encoding, they’ll be served the wrong content.

Squid’s handling of Vary: Accept-Encoding

Taobao Core System Team Blog Regarding the Squid request to the origin server’s response including the Vary header.

  1. The response header returned by the origin server does not include “Vary: Accept-Encoding”.
    • Regardless of whether the client request header contains “Accept-Encoding: gzip, deflate”, squid will only buffer one copy of the object.
    • If the first request packet that results in a Squid MISS error contains “Accept-Encoding: gzip, deflate” in its header, the request to the origin server will return a gzip-compressed object. Squid caches this compressed object . Subsequent client requests, regardless of whether they contain “Accept-Encoding: gzip, deflate” in their headers, will also return this compressed object .
    • If the first request that results in a Squid MISS does not include “Accept-Encoding: gzip, deflate” in its header, the request to the origin server will return a non-gzip compressed object. Squid caches this non-gzip compressed object . Thereafter, Squid will return this non-gzip compressed object regardless of whether other client requests include “Accept-Encoding: gzip, deflate” in their headers .
  2. The origin server returned a response with the header “Vary: Accept-Encoding”.
    • Squid buffers multiple objects based on the “Accept-Encoding” value in the client request.
    • Squid uses the “Accept-Encoding” value in the client request to request the corresponding data (compressed or uncompressed) from the origin server. After returning this request to the client, it stores multiple copies of the object in the local cache based on the different “Accept-Encoding” values.

in conclusion

Under normal circumstances, in order to support “Vary: Accept-Encoding”, nginx needs to cache uncompressed data objects locally in order to handle requests with “Accept-Encoding: gzip, deflate” (nginx only supports the gzip compression algorithm) and requests without this header.

In the configuration where the problem described at the beginning occurred, both the scheduling server and the backend server had compression enabled. Therefore, if the scheduling server’s first request did not include “Accept-Encoding,” it would also request uncompressed data from the application server. Subsequent requests would then be served normally.

However, if the initial request includes “Accept-Encoding”, the scheduling server will also retrieve the compressed data from the application server and cache it locally. Afterward, regardless of whether the client has “Accept-Encoding” enabled, it will receive the compressed data. All response headers will then look something like this:

HTTP/1.1 200 OK
Server: nginx/1.1.19
Date: Mon, 16 Jul 2012 18:31:47 GMT
Content-Type: text/html; charset=utf-8
Transfer-Encoding: chunked
Connection: keep-alive
Vary: Accept-Encoding
X-Powered-By: PHP/5.3.13
Expires: Mon, 16 Jul 2012 19:31:47 GMT
Cache-Control: max-age=3600
Content-Encoding: gzip

This compressed data will not cause any problems for browsers or SDKs capable of handling gzip. However, if the browser or SDK does not have gzip processing capabilities (which is why these programs or code do not set “Accept-Encoding: gzip, deflate”), they will get “garbled text”.

Solution

Objective: To ensure that the scheduling server only caches uncompressed data.

  • Disable compression settings on the business server.
  • Add “Via: xxx” or “Accept-Encoding: ”” to the proxy request header of the scheduling server.

If value is empty string, then header will not be sent to upstream. For example this setting can be used to disable gzip compression on upstream:

proxy_set_header Accept-Encoding "";

Additional questions

Q. If the scheduling server has already cached the compressed object and enabled compression settings, will a client with “Accept-Encoding: gzip, deflate” set to receive data that has been compressed twice?

A. To answer this question, we need to go back to the code block in `gzip_module` that checks for compression. The previous explanation gzip_http_versiondidn’t fully extract the code segment that checks the compression conditions in `gzip`.

if (!conf->enable
    || (r->headers_out.status != NGX_HTTP_OK
        && r->headers_out.status != NGX_HTTP_FORBIDDEN
        && r->headers_out.status != NGX_HTTP_NOT_FOUND)
    || (r->headers_out.content_encoding
        && r->headers_out.content_encoding->value.len)
    || (r->headers_out.content_length_n != -1
        && r->headers_out.content_length_n < conf->min_length)
    || ngx_http_test_content_type(r, &conf->types) == NULL
    || r->header_only)
{
    return ngx_http_next_header_filter(r);
}

r->gzip_vary = 1;

if (!r->gzip_tested) {
    if (ngx_http_gzip_ok(r) != NGX_OK) {
        return ngx_http_next_header_filter(r);
    }

} else if (!r->gzip_ok) {
    return ngx_http_next_header_filter(r);
}

When using cached data to respond to client requests, nginx reads the entire upstream response header from the cache file into ngx_http_request_t::headers_outthe structure. Since the upstream returns compressed data, the header field will definitely contain the “Content-Encoding: gzip” field. Therefore, in the above…

r->headers_out.content_encoding

If the condition is true, nginx will skip the processing of gzip_module.

Q. Why does Via prevent the application server from compressing the response? A.gzip_proxied In fact, this is the field that the command identifies when the file description is ambiguous (my English is too poor?) .

gzip_proxyed off | expired | no-cache | no-store | private | no_last_modified | no_etag | auth | any

It allows or disallows the compression of the response of for the proxy request in the dependence on the request and the response. The fact that, request proxy , is determined on the basis of line “Via” in the headers of request.

Its default value is off. This is also the setting I use on the scheduling server and the business server.

Anther explanation

As long as client request was identified as one came from proxy server (“Via” header present) nginx is able to disable or enable it’s own gzip depending on various conditions. These conditions are controlled via gzip_proxieddirective.

This guy explained it very clearly: Requests containing the “Via” field in their header will be treated by nginx as requests from another proxy program. Whether to enable gzip for this request, or under what conditions to enable gzip, can be controlled by `gzip_proxied` . The default value is to perform uncompressed compression on all proxied requests.

Looking back at the official documentation, describing “request proxy” as “proxy request” is more appropriate.

So, what does the code that controls compression via “Via” look like?

/* ngx_http_core_module.c:1919, ngx_http_gzip_ok */
if (r->headers_in.via == NULL) {
    goto ok;
}

p = clcf->gzip_proxied;

if (p & NGX_HTTP_GZIP_PROXIED_OFF) {
    return NGX_DECLINED;
}

if (p & NGX_HTTP_GZIP_PROXIED_ANY) {
    goto ok;
}
/* ... */

ngx_http_request_t::headers_inThe `<field> ` and ngx_http_request_t::headers_out `<field>` tags store all fields of the request. If the “Via” field exists in all requests, gzip_proxiedfurther judgment is made based on the configuration to determine whether the response data for this request needs to be compressed.

Q. Which Android SDKs do not support gzip decompression? A. ???

Q.proxy_cache_path What is the difference between the value in `<time>` and invalidthe proxy_cache_validtime value in `<time>`? A. The directive proxy_cache_validspecifies how long a response will be considered valid (and will be returned without any requests to backend). After this time, the response will be considered “stale” and will either not be returned or will be returned depending on proxy_cache_use_stale the setting.

Argument inactiveof proxy_cache_pathspecifies how long response will be stored in cache after last use. Note that even stale responses will be considered recently used if there are requests to them.

Q.What is the difference Cache-Controlbetween ExpiresA.Expires is defined by HTTP/1.0, and Cache-Controlis defined by HTTP/1.1. max-ageis just a straight integer number of seconds, while has a somewhat complex date format. And even Expiressmall errors in generating the Expiresvalues ​​can cause downstream caches to misintepret it. It happens more often than you think.

Cache-Controlwas introduced in HTTP/1.1 and offers more options than Expires. They can be used to accomplish the same thing but the data value for Expiresis an HTTP date whereas Cache-Controlmax-age lets you specify a relative amount of time so you could specify “X hours after the page was requested”.

To sum up though, Expiresis recommended for static resources like images and Cache-Controlwhen you need more control over how caching is done.

Q.proxy_ignore_headers What is its function and how is it used? A. Upstream cache-related directives have priority over value; in particular , proxy_cache_validthe order is:

  1. X-Accel-Expires
  2. Expires/Cache-Control
  3. proxy_cache_valid

The order in which your backend return HTTP headers change cache behavior. You may ignore the headers using

proxy_ignore_headers X-Accel-Expires Expires Cache-Control

proxy_ignore_headersdetermines which of upstream headers or proxy_cache_validis used by nginx to decide for how long nginx will cache the response.

Separate from that, you can use proxy_hide_headerto tell nginx not to send some headers that came from upstream, to the client.

Separate from that (mostly), you can use expiresto tell nginx how to set Expiresand Cache-Controlheaders in response to the client. (mostly) is there because nginx will not send a single Expiresheader, so if you use expiresto set one, then the one from upstream will not go to the client, even if it isn’t in proxy_hide_header.

To explicitly set Cache-ControlExpiresheaders, use the expires directives.

Q . expires A . Controls whether the response should be marked with an expiry time, and so, what time that is.

  • offprevents changes to the Expiresand `Cache-Control headers.
  • epochSet the Expiresheader to 1 January, 1970 00:00:01 GMT.
  • maxsets the Expiresheader to 31 December 2037 23:59:50 GMT, and the Cache-Controlmax-age to 10 years.
  • A time without an @prefix specifies an expiry time relative to either the response time (if the time is not preceded with “modified”) or the file’s modification time (when “modified” is present). A negative time can be specified, which sets the Cache-Controlheader to no-cache.
  • Times written with an @prefix represent an absolute time time-of-day expiry, written in either the form Hh or Hh:Mm , where H ranges from 0 to 24, and M ranges from 0 to 59.

A non-negative time or time-of-day sets the Cache-Controlheader to max-age=#, where # is the appropriate time in seconds.

Note : expiresworks only for 200, 204, 301, 302, and 304 responses.

Q. Is it truly effective to use different cache times for the scheduling server and the business server? A. Find the answer above.

misc

  • Expires/Cache-Control controls whether the browser retrieves data directly from the browser cache or sends a new request to the server. Cache-Control offers more control than Expires.
  • Last-Modified, If-Modified-Since, and ETag/If-None-Match are used by the browser to determine whether a file has been modified after sending a request to the server. If the file has not been modified, the server will return a 304 status. If it has been modified, the server will resend the data to the browser.

Don’t leave me so easily, please leave something behind…


Comments

Leave a Reply

Your email address will not be published. Required fields are marked *