Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it worth it to try to optimize away Nginx regexes?

In an Nginx virtual host, I have added these locations, two of which use regular expressions:

    location ~ /-/pubsub/subscribe/(.*)$ {
      # subscribe to websocket
    }
    location ~ /-/pubsub/publish/(.*)$ {
      # websocket publish endpoint
    }
    location / {
      # reverse proxy to the application server
    }

But instead I can do something like this, to "hide" the regexes?

    location /-/pubsub/ {       <-- can be tested without any regex matching
      location ~ subscribe/(.*)$ { ... }
      location ~ publish/(.*)$ { ... }
    }

    location / {
      # reverse proxy
    }

It seems to me that this would avoid parsing any regex, for request matching location /, because they'll be compared with location /-/pubsub/ (no regex) instead of location ~ /-/pubsub/whatever/(.*)$ (with regex), right?

In the same way, I've separated my video uploads from other uploads, because the video uploads make use of a regex:

  location /-/uploads/public/video/ {
    location ~ \.(mp4|m4v|m4a)$ {   <-- regex matching for videos only
      mp4;
    }
  }

  location /-/uploads/public/ {
    # all other files: no regex matching needed
  }

But I'm not sure if this a-tiny-bit-more-complicated configuration in order to avoid regexes, makes sense. Saving videos in a different folder, just to avoid regexes. Does it ought to be faster? And is it worth the trouble?

like image 233
KajMagnus Avatar asked Jan 21 '26 08:01

KajMagnus


1 Answers

It depends on your use case. Since nginx uses pcre, ultimately you're asking "does pcre use CPU", which is an obvious yes. However, if the majority of the work nginx does is serving file objects and managing remote tcp sockets, then you won't see an impact. If you're doing extremely fast proxying work with nginx (think 50k + connections), my experience is you will ABSOLUTELY experience major performance impact from regex optimizations

To know for certain what your bottleneck is, you need to use a tool like tcpdump and look for time-to-first-data-byte. Also, apache bench is single-threaded and thus very limited in usefulness for high-capacity web servers. If you have to use that tool, I recommend making a cluster of ab instances and add the results up ;)

like image 148
pozcircuitboy Avatar answered Jan 22 '26 23:01

pozcircuitboy



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!