-.- --. .-. --..

Rewrites outside location blocks in Nginx are bad!

When we did our recent performance tests on one of our nginx clusters I noticed something odd: the CPU was choking at a request rate that was too little for a system like that. It’s a static proxy server running vanilla nginx, and the downstream servers were doing okay in terms of latency. CPU on this system shouldn’t choke before saturating those downstream systems, but it did. perf reports on the process showed most of the samples occupied by symbols related to rewrite and ngx_http_regex_exec. While we have a lot of location rules in this codebase, and many of them are regular expression style matchers, it seemed like way too much time was occupied by these routines. What’s worse is that this happens even when we try running the benchmarks with known wrong URLs or triggering the block/rate-limiting configurations, which should bypass most of the location matching anyway. At one point I noticed Nginx (depending on compilation flags) has support for a PCRE JIT enigne configuration that promises improvement in regular expression matching. Turning this on did improve the situation quite a bit, but it wasn’t anything spectacular. The regex symbols still formed a large part in perf for every cut of URL type. Debugging this further pointed towards a combination of the following issues caused the problem:

  1. We have lots of rewrite rules within a server block outside of location blocks.
  2. The bypass routines in cases of known errors and/or rate limiting used relative paths in the config (internal redirects of sorts)

Our config had a bit of this shape:

http {
  server {
    server_name x.example.com localhost;
    listen 80;

    error_page  404 /404.html;
    error_page  403 /403.html;

    # hundreds of rewrite rules go here, intended to "normalize" matching URLs
    # rewrite {regex} ...
    # rewrite {regex} ...
    # rewrite {regex} ...
    # ... and so on

    location /404.html {
      root /srv/www/html;
    }

    location /403.html {
      root /srv/www/html;
    }

    location / {
      proxy_pass http://upstream;
    }
    # ... and so on
  }
}

In our case the configuration is spread over hundreds of files, and the rewrite rules shown here are all in a separate file which gets included right after the main server block configurations end and any location block definitions start.

A long time ago on the project a decision was made to add a file that contained a few URL matching rules that had to run in-order for every URL, before any location matching runs, sort of a pre-processor to “normalize” URLs.

This is okay as long as it doesn’t get abused. But slowly over time there were additions that should’ve simply been location blocks instead. For the uninitiated, location matching tends to be more efficient in matching URLs as nginx builds out a tree of these at startup rather than going at it serially one by one, which is what our problem technically became. This set of rules started growing as it became a kitchen-sink of sorts for every “wrong” URL—at the time of debugging the number of such rules were in the hundreds. These rules get executed multiple times if there are internal redirects; rewrite rules themselves can be internal redirects if they don’t use one of the bypassing flags like break, redirect, permanent, which further exacerbates the probem. This was true in our case since the intent was to run these in-order, which means last and break can’t be used by definition.

Secondly, we used relative paths for all the error_page configurations mostly as a carry-over from nginx configurations that are documented pretty much everywhere1 . So when an error status is triggered nginx will redo the matching from the beginning. In isolation this is not a problem, and I can understand why the default documentation snippets use this pattern. In our case these two problems in combination create a cascading effect: when testing out our rate limiting and error handling checks, which should’ve bypassed the relatively-costly location matching, every rewrite rule got run twice, which made the performance pathologically worse!

So here’s a PSA of sorts:

  1. Try not to have rewrite rules outside of a location block
  2. Prefer named routes for “jump”s or bypass internal redirects instead of normal URLs/paths. This would’ve avoided the second execution of the rewrite rules. Something like the below snippet:
error_page 404 @notfound;

location @notfound {
  try_files 404.html =500;
}

Example

Just to demonstrate this with an example, I’m going to use this configuration in nginx, which is deliberately close to what we had structurally:

events {  }

error_log   /dev/stderr notice;

http {
    include         /etc/nginx/mime.types;
    default_type    application/octet-stream;
    access_log      /dev/stdout combined;
    rewrite_log     on;

    # The 404 page definition is copied verbatim from the example config that
    # every debian Nginx package ships with.
    #
    # Culprit 1:
    error_page 404 /404.html;
    error_page 403 /403.html;

    server {
        listen      80;
        server_name localhost;
        root /usr/share/nginx/html;

        # Culprit 2: Naked rewrite rules that try to match for every route, even
        # internal redirects
        rewrite /unknown1/(.*)  /unknown/$1;
        rewrite /unknown2/(.*)  /unknown/$1;
        rewrite /unknown3/(.*)  /unknown/$1;
        rewrite /unknown4/(.*)  /unknown/$1;

        # This will try to send out the file /usr/share/nginx/html/main.html or
        # else respond with a 404 error page.
        location = /main {
            try_files index.html /index.html =404;
        }

        location /notauthorized {
            return 403;
        }

        location /nonexistent {
            return 404;
        }

        # Only the 50x.html template is present in the latest nginx container,
        # so using that as a generic error page to keep things simple.
        location /404.html { try_files 50x.html /50x.html =500; }
        location /403.html { try_files 50x.html /50x.html =500; }

        # Always go to /main by issuing an internal rewrite
        location / {
            rewrite ^/.*$ /main;
        }
    }
}

This sets up three main user-facing routes: /, /main, /notauthorized, /nonauthorized. / redirects to /main internally, although the user won’t see any 3xx, while /notauthorized returns a 403 response. The latter too (return, as used in this case) is implemented as an internal redirect within nginx, so the routing and rule execution behaviour is going to be similar between the 404 case and the 403 case. For the uninitiated, try_files (as used here) in nginx checks the paths given to it within the root path, or else return the status code mentioned at the end. error_page allow for configuring extra routes when nginx has to respond to a particular status code. Effectively, this too is an internal redirect before and after: when a location block has the redirect rule, and when the redirect rule itself has a path as the target location.

I’ll try the following four routes:

curl localhost/
curl localhost/main
curl localhost/notauthorized
curl localhost/nonexistent

The / and /main runs are just to demonstrate the extra rewrite between them. rewrite_log on; does what it says on the tin, and here’s a filtered snippet from the logs:

GET /
"/unknown1/(.*)" does not match "/"
"/unknown2/(.*)" does not match "/"
"/unknown3/(.*)" does not match "/"
"/unknown4/(.*)" does not match "/"
"^/.*$" matches "/"
rewritten data: "/main", args: ""

GET /main
"/unknown1/(.*)" does not match "/main"
"/unknown2/(.*)" does not match "/main"
"/unknown3/(.*)" does not match "/main"
"/unknown4/(.*)" does not match "/main"

GET /notauthorized
"/unknown1/(.*)" does not match "/notauthorized"
"/unknown2/(.*)" does not match "/notauthorized"
"/unknown3/(.*)" does not match "/notauthorized"
"/unknown4/(.*)" does not match "/notauthorized"
"/unknown1/(.*)" does not match "/403.html"
"/unknown2/(.*)" does not match "/403.html"
"/unknown3/(.*)" does not match "/403.html"
"/unknown4/(.*)" does not match "/403.html"

GET /nonexistent
"/unknown1/(.*)" does not match "/nonexistent"
"/unknown2/(.*)" does not match "/nonexistent"
"/unknown3/(.*)" does not match "/nonexistent"
"/unknown4/(.*)" does not match "/nonexistent"
"/unknown1/(.*)" does not match "/404.html"
"/unknown2/(.*)" does not match "/404.html"
"/unknown3/(.*)" does not match "/404.html"
"/unknown4/(.*)" does not match "/404.html"

Both the / route and /main work as expected: the naked rewrite rules run once, but in the cases of the other two these get executed twice. With the current config it’s a bit hard to demonstrate, but the pathological case happens even when those 403, 404 cases happen naturally: an undefined location etc. Using named routes this is the rewritten (no pun intended) config:

events {  }

error_log   /dev/stderr notice;

http {
    include         /etc/nginx/mime.types;
    default_type    application/octet-stream;
    access_log      /dev/stdout combined;
    rewrite_log     on;

    error_page 404 @404.html;
    error_page 403 @403.html;

    server {
        listen      80;
        server_name localhost;
        root /usr/share/nginx/html;

        # Culprit 2: Naked rewrite rules that try to match for every route, even
        # internal redirects
        rewrite /unknown1/(.*)  /unknown/$1;
        rewrite /unknown2/(.*)  /unknown/$1;
        rewrite /unknown3/(.*)  /unknown/$1;
        rewrite /unknown4/(.*)  /unknown/$1;

        # This will try to send out the file /usr/share/nginx/html/main.html or
        # else respond with a 404 error page.
        location = /main {
            try_files index.html /index.html =404;
        }

        location /notauthorized {
            return 403;
        }

        location /nonexistent {
            return 404;
        }

        # Only the 50x.html template is present in the latest nginx container,
        # so using that as a generic error page to keep things simple.
        location @404.html { try_files 50x.html /50x.html =500; }
        location @403.html { try_files 50x.html /50x.html =500; }

        # Always go to /main by issuing an internal rewrite
        location / {
            rewrite ^/.*$ /main;
        }
    }
}
GET /
"/unknown1/(.*)" does not match "/"
"/unknown2/(.*)" does not match "/"
"/unknown3/(.*)" does not match "/"
"/unknown4/(.*)" does not match "/"
"^/.*$" matches "/"
rewritten data: "/main", args: ""

GET /main
"/unknown1/(.*)" does not match "/main"
"/unknown2/(.*)" does not match "/main"
"/unknown3/(.*)" does not match "/main"
"/unknown4/(.*)" does not match "/main"

GET /notauthorized
"/unknown1/(.*)" does not match "/notauthorized"
"/unknown2/(.*)" does not match "/notauthorized"
"/unknown3/(.*)" does not match "/notauthorized"
"/unknown4/(.*)" does not match "/notauthorized"

GET /nonexistent
"/unknown1/(.*)" does not match "/nonexistent"
"/unknown2/(.*)" does not match "/nonexistent"
"/unknown3/(.*)" does not match "/nonexistent"
"/unknown4/(.*)" does not match "/nonexistent"

As expected, only one set of rewrite rule runs. That said, the actual fix would be to refactor the rewrites into location blocks to improve the matching performance a little further.


Footnotes

  1. Similar configuration is also shipped with the default debian package at least as of Debian 11, and the official Nginx container at least as of 1.27.0

← Go can only read 1GiB per Read call