Reading other people’s code is a skill. It is different from writing code, different from debugging your own code, and different from reading documentation. Most developers never develop a systematic approach — they open a repository, feel overwhelmed by the file count, and give up. This post describes a methodology I have refined over years of reading codebases from nginx to CPython to keepalived, with a concrete walkthrough to make each step tangible.
Step 1: Define Your Question Before You Read
Never open a codebase without a specific question. ‘Understand nginx’ is not a question. How does nginx decide which location block matches a request?’ is. Your question determines what to read, what to skip, and when to stop. Every time you feel lost in a codebase, return to your question. If you cannot articulate what you are looking for, you will wander.
For this walkthrough, our question is: how does nginx’s keepalive_requests directive work — where is it checked, and what happens when the limit is reached?
Step 2: Get the Code and Orient Yourself
Clone the repository. Do not use GitHub’s web interface for serious reading — local code allows grep, ctags, and fast navigation. Run git log –oneline -20 to see recent history. Read the top-level README and any ARCHITECTURE or DESIGN documents. Look at the directory structure — not every file, just the top level.
git clone https://github.com/nginx/nginx.git cd nginx ls src/ # core/ event/ http/ mail/ misc/ os/ stream/ wc -l src/http/*.c | sort -rn | head -20 # find largest files
Step 3: Find the Entry Point for Your Question
For configuration-driven behavior like keepalive_requests, the entry point is the directive definition. In nginx, all directives are registered in module definition structs. Use grep to find it: grep -r ‘keepalive_requests’ src/ –include=’*.c’. This immediately shows: the directive is defined in ngx_http_core_module.c, stored in ngx_http_core_loc_conf_t, and checked in ngx_http_set_keepalive().
Step 4: Trace the Call Graph Outward
Once you have the entry point, trace calls in both directions: who calls this function, and what does this function call. Use ctags or cscope for jump-to-definition, or use grep systematically. For ngx_http_set_keepalive(), it is called from ngx_http_finalize_connection(). That is called from ngx_http_finalize_request(). Now you have a call chain.
Do not read every function you encounter. Read the ones that relate to your question; skim or note the others. Build a call graph on paper or in a text file as you go.
# ngx_http_set_keepalive() checks: if (clcf->keepalive_requests == 0 || c->requests >= clcf->keepalive_requests) { ngx_http_close_connection(c); return; }
Step 5: Form and Test Hypotheses
After reading a function or module, write down your hypothesis about what it does. Then verify it. For keepalive_requests: hypothesis — nginx increments c->requests on each request and closes the connection when it reaches keepalive_requests. To verify: find where c->requests is incremented (ngx_http_request.c, ngx_http_create_request()), and confirm the comparison in ngx_http_set_keepalive().
Forming hypotheses and verifying them is faster than reading every line. It forces you to predict behavior, which deepens understanding and surfaces contradictions.
Step 6: Use Tests and Comments as Documentation
Well-maintained projects have test suites that describe expected behavior precisely. Read the tests for the feature you are investigating. In nginx’s test suite (nginx/nginx-tests), find keepalive-related tests: they show exactly what inputs produce what outputs, which is often clearer than the code itself.
Comments in the code are the author’s intent — sometimes outdated, but often invaluable. Pay special attention to comments that say ‘this is subtle’ or ‘we do X because of Y’ — these are the parts worth reading slowly.
Step 7: Read Diffs, Not Just the Current State
Use git log -p –all -S ‘keepalive_requests’ to find every commit that touched the directive. Diffs show why code exists: the bug it fixed, the feature it added, the tradeoff it resolved. Reading the commit message and diff for a feature often teaches more in 5 minutes than reading the current code for an hour.
Conclusion
Reading open-source code is a discipline: start with a specific question, find the entry point, trace the call graph, form hypotheses and verify them, use tests and diffs. Applied consistently, this methodology makes any codebase approachable — whether it is nginx, CPython, keepalived, or Twisted. The blogs in this series were all written using exactly this process.


Leave a Reply