Nginx upstream keepalive connections are one of the most impactful performance optimizations for reverse proxy deployments. By reusing TCP connections to upstream servers, you eliminate the latency and CPU cost of TCP handshakes on every request. But the configuration is non-obvious, the defaults are misleading, and there are several production pitfalls that only appear under load. This post covers the correct configuration, the common mistakes, and how to fix them.
The Basic Configuration
Upstream keepalive requires three directives working together. The keepalive directive sets the number of idle keepalive connections to maintain in each worker’s pool. The proxy_http_version must be 1.1 (HTTP/1.0 does not support persistent connections). And Connection header forwarding must be cleared to prevent upstream servers from thinking the request wants Connection: close.
upstream backend { server 127.0.0.1:8080; keepalive 32; } server { location / { proxy_pass http://backend; proxy_http_version 1.1; proxy_set_header Connection “”; } }
Pitfall 1: keepalive Is Per Worker, Not Global
The keepalive 32 directive means each nginx worker process maintains up to 32 idle connections to the upstream. With 8 worker processes, you can have up to 256 total idle connections to the upstream. If your upstream server has a max connection limit (e.g., PostgreSQL’s max_connections), multiply keepalive by worker_processes to understand your actual pool size.
Under-sizing keepalive causes connection pool thrashing: workers exhaust their idle connections, close some, and immediately need to open new ones. Over-sizing wastes upstream file descriptors and may hit upstream connection limits.
Pitfall 2: keepalive_requests and keepalive_time
By default, nginx closes a keepalive connection after keepalive_requests (default 1000) requests or keepalive_time (default 1h). If your upstream has its own keepalive timeout (e.g., Node.js defaults to 5 seconds, Go’s net/http defaults to 90 seconds), and nginx’s idle timeout (keepalive_timeout in the upstream context, not to be confused with the client-facing directive) exceeds it, nginx will try to reuse a connection the upstream has already closed.
This produces upstream connection reset errors. Fix: set proxy_socket_keepalive on and ensure nginx’s upstream keepalive timeout is shorter than the upstream’s server-side keepalive timeout.
upstream backend { server 127.0.0.1:8080; keepalive 32; keepalive_requests 10000; keepalive_time 30s; # shorter than upstream’s timeout }
Pitfall 3: Missing proxy_next_upstream Configuration
When nginx gets a connection reset from an upstream (because the upstream closed a keepalive connection), proxy_next_upstream determines whether nginx retries the request on another upstream or returns 502. The default includes error and timeout but not non_idempotent.
For GET requests, nginx will retry on connection reset. For POST requests (non-idempotent), it will not by default — it returns 502 instead. If your POST endpoints are safe to retry (idempotent despite the method), add proxy_next_upstream_tries 2 and include the specific error conditions you want to retry.
Pitfall 4: Health Checks and Pool Contamination
Nginx open-source does not have active upstream health checks (that is an Nginx Plus feature). Without health checks, a failed upstream stays in the pool until a request fails with it. With keepalive connections, this failure may not be detected until the idle connection is used — at which point a real request fails.
Workaround: use keepalive_time to expire connections periodically, use proxy_connect_timeout and proxy_read_timeout to fail fast, and configure upstream max_fails and fail_timeout to temporarily remove failed upstreams from the pool after repeated errors.
Pitfall 5: SSL Upstream Keepalive
Keepalive connections to SSL upstreams (proxy_pass https://…) work but have additional overhead: the SSL session must be negotiated on the first connection, but subsequent keepalive connections reuse the SSL session (via session tickets or session IDs). Ensure proxy_ssl_session_reuse is on (default). Also, proxy_ssl_server_name on is required if the upstream uses SNI — without it, nginx sends no SNI extension, and some upstreams reject the connection.
Measuring the Impact
Use nginx’s built-in metrics or the stub_status module to observe active connections. Compare upstream connection rates before and after enabling keepalive: if TCP connections to the upstream drop by 80-90%, keepalive is working. Use netstat -tn | grep ESTABLISHED | grep :8080 | wc -l to count live connections per upstream port.
Conclusion
Nginx reverse proxy upstream keepalive is essential for high-traffic deployments but requires careful sizing. The key points: keepalive is per-worker, set keepalive_time shorter than the upstream’s idle timeout, configure proxy_next_upstream for your retry strategy, and compensate for missing active health checks with fail_timeout. With these in place, keepalive connections reliably reduce upstream TCP overhead by an order of magnitude.


Leave a Reply