Thursday, May 27, 2010

Nginx "how to" - Fast and Secure Web Server

Nginx is a fast and efficient web server. It can be configured to serve out files or be a reverse proxy depending on your application. What makes this web server different from Apache, Lighttpd or thttpd is the overall efficiency of the daemon, the number of configuration options and how easy it is to setup.
Nginx ("engine x") is a high-performance HTTP server and reverse proxy server. Nginx was written by Igor Sysoev for rambler.ru, Russia's second-most visited website, where it has been running in production for over two and a half years. Igor has released the source code under a BSD-like license. Although still in beta, Nginx is known for its stability, rich feature set, simple configuration, and low resource consumption. Nginx


The methodology behind our configuration - Trust No One

In the following example we are going to setup a simple web server to serve our static web pages to explain the basics. The daemon will load a few mime include files, compress outgoing data in real time and set the expires header to reduce bandwidth of client cached traffic. Full logging is on, in the default Apache format with the addition of compressed file size and the amount of time the server took to fulfill the request. Finally, we are going to set up restriction filters by ip to limit access to the "/secure" directory structure where you might put more sensitive non-public data.
The security mindset of the configuration is very paranoid. There are a significant amount of bots, scanners and broken clients that will abuse your site if given the opportunity. These clients will waste your bandwidth and system resources. As a response, we will not trust any client to access our server without first making sure that all of the request parameters are met. This means that the remote client must be asking for our site by the proper host name and must request any support files, like pictures and css, with the referrer headers properly set. Any deviation from these rules will lead to Nginx dropping the client's connection with a return code 444. Even though Nginx does not have a module like mod_security we can still make our own access rules. Note that even though these rules are strict, normal web traffic and bots like Google can access the site without issue.


Option 1: Nginx webserver to serve static files

Our goal is to setup a fast serving and CPU/disk efficient web server, but most importantly a _very secure_ web server. This configuration will work for the latest version of Nginx as well as the development versions. For the purpose of this example we built the latest development version of Nginx 0.8.x from source.
Below you will find the link to the calomel.org nginx config example file and below that is the same nginx.conf file in a text box. Both formats are available to make it easier for you to review the code. This example is a fully working config file with the exception of setting up a few variables for your environment.
You can download the nginx.conf here by doing a "save as" or just by clicking on the link and choosing download. Before using the config file take a look it below or download it and look at the options.

#######################################################
###  Calomel.org  /etc/nginx.conf  BEGIN
#######################################################
#
pid               /var/run/nginx.pid;
user              nginx nginx;
worker_processes  2;

events {
    worker_connections  1024;
}

http {
 ## MIME types
  include         mime.types;
# types {
#   image/gif     gif;
#   image/jpeg    jpg;
#   image/png     png;
#   image/bmp     bmp;
#   image/x-icon  ico;
#   text/css      css;
#   text/html    html;
#   text/plain    bob;
#   text/plain    txt;
   }
  default_type       application/octet-stream;

 ## Size Limits
  client_body_buffer_size   8k;
  client_header_buffer_size 1k;
  client_max_body_size      1k;
  large_client_header_buffers 1 1k;

 ## Timeouts 
  client_body_timeout   5;
  client_header_timeout 5;
  keepalive_timeout     5 5;
  send_timeout          5;

 ## General Options
  ignore_invalid_headers   on;
  limit_zone gulag $binary_remote_addr 1m;
  recursive_error_pages    on;
  sendfile                 on;
  server_name_in_redirect off;
  server_tokens           off;

 ## TCP options  
  tcp_nodelay on;
  tcp_nopush  on;

 ## Compression
  gzip              on;
  gzip_static       on;
  gzip_buffers      16 8k;
  gzip_comp_level   9;
  gzip_http_version 1.0;
  gzip_min_length   0;
  gzip_types        text/plain text/html text/css image/x-icon image/bmp;
  gzip_vary         on;

 ## Log Format
  log_format  main  '$remote_addr $host $remote_user [$time_local] "$request" '
                    '$status $body_bytes_sent "$http_referer" "$http_user_agent" "$gzip_ratio"';

 ## Deny access to any host other than (www.)mydomain.com
    server {
         server_name  _;  #default
         return 444;
     }

 ## Server (www.)mydomain.com
  server {
      access_log  /var/log/nginx/access.log main buffer=32k;
      error_log   /var/log/nginx/error.log info;
      expires     31d;
      limit_conn  gulag 5;
      listen      127.0.0.1:8080 rcvbuf=64k backlog=128;
      root        /disk01/htdocs;
      server_name mydomain.com www.mydomain;

     ## SSL Options (only enable if you use a SSL certificate)
    # ssl on;
    # ssl_certificate /ssl_keys/mydomain.com_ssl.crt;
    # ssl_certificate_key /ssl_keys/mydomain_ssl.key;
    # ssl_ciphers HIGH:!ADH:!MD5;
    # ssl_prefer_server_ciphers on;
    # ssl_protocols SSLv3;
    # ssl_session_cache shared:SSL:1m;
    # ssl_session_timeout 5m;

     ## Only allow GET and HEAD request methods
      if ($request_method !~ ^(GET|HEAD)$ ) {
         return 444;
      }

     ## Deny illegal Host headers
      if ($host !~* ^(mydomain.com|www.mydomain.com)$ ) {
        return 444;
      }

     ## Deny certain User-Agents (case insensitive)
     ## The ~* makes it case insensitive as opposed to just a ~
     if ($http_user_agent ~* (Baiduspider|Jullo) ) {
        return 444;
     }

     ## Deny certain Referers (case insensitive)
     ## The ~* makes it case insensitive as opposed to just a ~
     if ($http_referer ~* (babes|click|diamond|forsale|girl|jewelry|love|nudit|organic|poker|porn|poweroversoftware|sex|teen|video|webcam|zippo) ) {
        return 444;
     }

     ## Redirect from www to non-www
      if ($host = 'www.mydomain.com' ) {
        rewrite  ^/(.*)$  http://mydomain.com/$1  permanent;
      }

     ## Stop Image and Document Hijacking
      location ~* (\.jpg|\.png|\.css)$ {
        if ($http_referer !~ ^(http://mydomain.com) ) {
          return 444;
        }
      }

     ## Restricted Access directory
      location ^~ /secure/ {
            allow 127.0.0.1/32;
            allow 10.10.10.0/24;
            deny all;
            auth_basic "RESTRICTED ACCESS";
            auth_basic_user_file /var/www/htdocs/secure/access_list;
        }

     ## Only allow these file types to document root
      location / {
        if ($request_uri ~* (^\/|\.html|\.jpg|\.org|\.png|\.css|favicon\.ico|robots\.txt)$ ) {
          break;
        }
        return 444;
      }

     ## Serve an empty 1x1 gif _OR_ an error 204 (No Content) for favicon.ico
      location = /favicon.ico {
       #empty_gif;
        return 204;
      }

      ## System Maintenance (Service Unavailable) 
      if (-f $document_root/system_maintenance.html ) {
        error_page 503 /system_maintenance.html;
        return 503;
      }

     ## All other errors get the generic error page
      error_page 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417
                 500 501 502 503 504 505 /error_page.html;
      location  /error_page.html {
          internal;
      }
  }
}
#
#######################################################
###  Calomel.org  /etc/nginx.conf  END
#######################################################


Option 2: Nginx reverse proxy for backend web servers

This config is for a reverse proxy server in front of three backend web servers. One is for web content, one is a forum and the last is a file server.
As requests come in the Nginx Proxy server will look at the URL path and direct requests to the proper backend server. This config will also cache requests to the one web content server, but not the forum or data download servers. We also configured Nginx to compress http calls back to the client in real time, thus saving bandwidth.
#######################################################
###  Calomel.org  /etc/nginx.conf  BEGIN
#######################################################
pid               /var/run/nginx.pid;
user              nginx nginx;
worker_processes  10;

events {
    worker_connections  1024;
}

http {
 ## MIME types
 #include            /etc/nginx_mime.types;
  default_type       application/octet-stream;

 ## Size Limits
  client_body_buffer_size     128K;
  client_header_buffer_size   128K;
  client_max_body_size          1M;
  large_client_header_buffers 1 1k;

 ## Timeouts
  client_body_timeout   60;
  client_header_timeout 60;
  expires               24h;
  keepalive_timeout     60 60;
  send_timeout          60;

 ## General Options
  ignore_invalid_headers   on;
  keepalive_requests      100;
  limit_zone gulag $binary_remote_addr 5m;
  recursive_error_pages    on;
  sendfile                 on;
  server_name_in_redirect off;
  server_tokens           off;

 ## TCP options
  tcp_nodelay on;
  tcp_nopush  on;

 ## Compression
  gzip              on;
  gzip_buffers      16 8k;
  gzip_comp_level   6;
  gzip_http_version 1.0;
  gzip_min_length   0;
  gzip_types        text/plain text/css image/x-icon application/x-perl application/x-httpd-cgi;
  gzip_vary         on;

 ## Log Format
  log_format  main  '$remote_addr $host $remote_user [$time_local] "$request" '
                    '$status $body_bytes_sent "$http_referer" "$http_user_agent" '
                    '"$gzip_ratio"';

 ## Proxy options
  proxy_buffering           on;
  proxy_cache_min_uses       3;
  proxy_cache_path          /usr/local/nginx/proxy_temp/ levels=1:2 keys_zone=cache:10m inactive=10m max_size=1000M;
  proxy_cache_valid         any 10m;
  proxy_ignore_client_abort off;
  proxy_intercept_errors    on;
  proxy_next_upstream       error timeout invalid_header;
  proxy_redirect            off;
  proxy_set_header          X-Forwarded-For $remote_addr;
  proxy_connect_timeout     60;
  proxy_send_timeout        60;
  proxy_read_timeout        60;

 ## Backend servers (web1 is the primary and web2 will come up if web1 is down)
    upstream webbackend  {
      server web1.domain.lan weight=10 max_fails=3 fail_timeout=30s;
      server web2.domain.lan weight=1 backup;
    }

  server {
      access_log  /var/log/nginx/access.log main;
      error_log   /var/log/nginx/error.log;
      index       index.html;
      limit_conn  gulag 50;
      listen      127.0.0.1:80 default;
      root        /usr/local/nginx/html;
      server_name _;

     ## Only requests to our Host are allowed
      if ($host !~ ^(mydomain.com|www.mydomain.com)$ ) {
         return 444;
      }

     ## Only allow these request methods
      if ($request_method !~ ^(GET|HEAD|POST)$ ) {
         return 444;
      }

     ## Only allow these file types to document root
      location / {
        if ($request_uri ~* (^\/|\.html|\.jpg|\.pl|\.png|\.css|\.ico|robots\.txt)$ ) {
          break;
        }
        return 444;
      }

     ## PROXY - Forum 
      location /forum/ {
        proxy_pass http://forum.domain.lan/forum/;
      }

     ## PROXY - Data
      location /files/ {
        proxy_pass http://data.domain.lan/;
      }

     ## PROXY - Web
      location / {
        proxy_pass  http://webbackend;
        proxy_cache            cache;
        proxy_cache_valid      200 24h;
        proxy_cache_use_stale  error timeout invalid_header updating http_500 http_502 http_503 http_504;
        proxy_ignore_headers   Expires Cache-Control;
      }

     ## All other errors get the generic error page
      error_page 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417
                 500 501 502 503 504 505 506 507 /error_page.html;
      location  /error_page.html {
          internal;
      }
  }
}
#
#######################################################
###  Calomel.org  /etc/nginx.conf  END
#######################################################




Building nginx from source

To get started, you need to first install nginx on your machine. The source code is available from the nginx home page and practically every distribution has pre-made packages if you prefer those. The install is very easy and it will not take you much time.
We highly recommend you build Nginx from source. This way you can modify the code if you need to and make sure you apply the latest patches when they come out.
IMPORTANT: note for OpenBSD installs
When building from source on OpenBSD it _may_ fail if you try to use the default "malloc.h" include files. This problem should be fixed in the latest development version, but if the build fails for you then simply modify the source to have nginx build with the stdlib.h includes. To do this open up the file src/os/unix/ngx_posix_config.h and edit line 74.
## vi src/os/unix/ngx_posix_config.h (line 74)
OLD LINE:   #include 
NEW LINE:   #include  /* was #include  */
Secondly, you need to make sure that the package for PCRE is installed. Use the command "pkg_add -i pcre" to install from your chosen PKG_PATH repository. BTW, you may want to also look at the Perl script pkg_find for OpenBSD package management.


For more information about OpenBSD's Pf firewall, CARP and HFSC quality of service options check out our PF Config (pf.conf), PF CARP and PF quality of service HFSC "how to's".


OPTIONAL: Change the Server: string of your host
The Server: string is the header which is sent back to the client to tell them what type of http server you are running and possibly what version. This string is used by places like Alexia and Netcraft to collect statistics about how many and of what type of web server are live on the Internet. To support the author and statistics for Nginx we recommend keeping this string as is. But, for security you may not want people to know what you are running and you can change this in the source code. Edit the source file src/http/ngx_http_header_filter_module.c at look at lines 48 and 49. You can change the String to anything you want.
## vi src/http/ngx_http_header_filter_module.c (lines 48 and 49)
static char ngx_http_server_string[] = "Server: MyDomain.com" CRLF;
static char ngx_http_server_full_string[] = "Server: MyDomain.com" CRLF;
OPTIONAL: annonomize you server string in the auto generated error pages
When nginx sends an error back to the client it can auto generate the error page. This error page has the error code at the top, a single horizontal line and then the string "nginx" and possibly the version number. If you want to you can take out the server string in the error page by editing the source code in the file src/http/ngx_http_special_response.c on lines 21 and 28. The following line would make the nginx generated error pages show your domain name for example.
## vi src/http/ngx_http_special_response.c (lines 21 and 28)
"
http://mydomain.org
" CRLF ## You can also change all of the built in error ## messages with just a carriage return. static char ngx_http_error_301_page[] = CRLF; static char ngx_http_error_302_page[] = CRLF; static char ngx_http_error_400_page[] = CRLF; static char ngx_http_error_401_page[] = CRLF; static char ngx_http_error_402_page[] = CRLF; static char ngx_http_error_403_page[] = CRLF; static char ngx_http_error_404_page[] = CRLF; static char ngx_http_error_405_page[] = CRLF; static char ngx_http_error_406_page[] = CRLF; static char ngx_http_error_408_page[] = CRLF; static char ngx_http_error_409_page[] = CRLF; static char ngx_http_error_410_page[] = CRLF; static char ngx_http_error_411_page[] = CRLF; static char ngx_http_error_412_page[] = CRLF; static char ngx_http_error_413_page[] = CRLF; static char ngx_http_error_414_page[] = CRLF; static char ngx_http_error_415_page[] = CRLF; static char ngx_http_error_416_page[] = CRLF; static char ngx_http_error_495_page[] = CRLF; static char ngx_http_error_496_page[] = CRLF; static char ngx_http_error_497_page[] = CRLF; static char ngx_http_error_500_page[] = CRLF; static char ngx_http_error_501_page[] = CRLF; static char ngx_http_error_502_page[] = CRLF; static char ngx_http_error_503_page[] = CRLF; static char ngx_http_error_504_page[] = CRLF; static char ngx_http_error_507_page[] = CRLF;
This same file contains all of the default HTML error pages Nginx will send to the client if there is an error. Look for the functions that start with the line static char ngx_http_error_ and make any changes you find necessary. Note that the HTML text is only shown to the user and that all errors sent by Nginx will have the proper error code in the HTML headers. This means you can put anything you want into the HTML code.
OPTIONAL: change any of the default error codes
Normally you DO NOT want to change any of the standard error codes specified by RFC. But, in case you really need to you can edit the file src/http/ngx_http_request.h and look for the variables starting with NGX_HTTP_REQUEST. For example, if we wanted to change the default error code for REQUEST_URI_TOO_LARGE from 414 to 999 we could:
vi src/http/ngx_http_request.h (line 83)
OLD LINE:  #define NGX_HTTP_REQUEST_URI_TOO_LARGE     414
NEW LINE:  #define NGX_HTTP_REQUEST_URI_TOO_LARGE     999


Building from source

Building nginx for the AMD64 architecture on OpenBSD (running as user/group "nginx")
For the purpose of this example, Nginx was built with the following arguments. Make sure to check if you need to use one of the modules that we omit during the build. Our example nginx.conf works fine with the following:
make clean

./configure --with-http_gzip_static_module --without-http_autoindex_module /
 --prefix=/usr/local/nginx --sbin-path=/usr/local/sbin/nginx --conf-path=/etc/nginx.conf /
 --pid-path=/var/run/nginx.pid --http-log-path=/var/log/nginx/access.log /
 --error-log-path=/var/log/nginx/error.log --user=nginx --group=nginx

make && make install
Once Nginx is built and installed in place it is time to take a look at the config file.




Explaining the directives in nginx.conf

Now we need to edit the config file for your environment. Lets take a look at each of the directives that need attention.
pid /var/run/nginx.pid :   This is the location of the process id file that holds the pid number of the master Nginx process. If you wanted to re-read the nginx.conf file without restarting the daemon you could cat this file and send a HUP like so, "kill -HUP `cat /var/run/nginx.pid` .
user nginx nginx :   Is the user and group the child processes will run as. You may need to make this user and group if you install Nginx from source. Make sure this user is completely unprivileged or at least runs with the least privileges necessary to make the server work.
worker_processes 2 :   Is the number of worker processes to spawn. A worker is similar to a child process in Apache. Nginx has the ability to use more then one worker process for several reasons: use on (SMP) multiple processors machines, to decrease latency when workers are blocked by disk I/O, or to limit the number of connections per process when select() or poll() is used. The general rule of the thumb is to set the number of nginx workers to two(2) or the number of CPUs your server has; which ever is greater. But, on most servers you will find out that two(2) workers serve pages quickly and put less load on the server. The exception to this rule is if you use ssl and/or compress all of your content. If you use ssl and compression then we suggest testing you site with double the amount of workers. Our example nginx.conf has 2 workers so we would set it to 4.
For testing, we suggest using the Apache benchmark binary (ab) to stress your server and see how many connections your machine can handle. "ab" can be found in any Apache install. To calculate how many total concurrent connections nginx can support, multiply "worker_processes" times "worker_connections". Our example is setup to handle 2*1024=2048 total concurrent connections. Clients who attempt to connect after 2048 clients are already connected will be denied access. It is better to deny clients than overload the machine possibly causing a DOS.
worker_connections 1024 :   This is the amount of client connections a single child process will handle by themselves at any one time. (default: 1024) Note: Multiply worker_processes times worker_connections for the total amount of connections Nginx will handle. Our example is setup to handle 2*1024=2048 connection in total. Clients who connect after the max has been reached will be denied access.
MIME types :   This section allows nginx to identify files by extension. For example, if we serve out a .txt file then the mime type would be defined as text/plain.
include mime.types is the definition file nginx loads to identify all of the mime types. These directive simply allow our server to send the the proper file type and application type to the clients. Alternatively you can take out this line and instead define your own Mime types by using the following "type" directive".
types {...} Instead of using the "include mime.types" directive you can define your own mime types. This is especially useful option if you want to use the same mime types on many different systems or do not want to rely on a secondary definition file. You also have the option of defining a mime type for a non-standard extension. In our example we define the extension "bob" as a text/plain.
default_type application/octet-stream is the default type if a file extension has not already be defined in the mime.types file. This is useful if you serve out files with no extension or of a non standard extension. Either way, clients will be able to retrieve the file un-obstructed.
Size Limits :   These directive specify the buffer size limitations on the amount of data we will consider to be valid for a request. If the client sends to much data in one request, for example in a buffer overflow attack, then the request will be denied.
client_body_buffer_size 1k If the request body is more than the buffer, then the entire request body or some part is written in a temporary file.
client_header_buffer_size 1k is the limit on the size of all of the http headers the client can send to the server. For the overwhelming majority of requests a buffer size of 1K is sufficient. The only time you would need to increase this is if you have a custom header or a large cookie sent from the client.
client_max_body_size 1k is the maximum accepted body size of client request, indicated by the line "Content-Length" in the header of request. If size exceeds this value the client gets sent the error "Request Entity Too Large" (413). If you expect to receive files uploaded to your server through the POST request method you should increase this value.
large_client_header_buffers 1 1k is the limit of the URI request line which can not be larger than the buffer size multiplied by the amount of buffers. In our example we accept a buffer size of 1 kilobyte and there is only one(1) buffer. So, will not accept a URI which is larger than (1x1K=1K) 1 kilobyte of data. If the client sends a bigger request then Nginx will return an error "Request URI too large" (414). The longest header line of the request must also be less than the size of (1x1K=1K) 1 kilobyte, otherwise the client get the error "Bad request" (400). Limiting the client URI is important to keep a scanner or broken client from sending large requests and possibly cause a denial of service (DOS) or buffer overflow.
Timeouts :   These values specify the amount of time in seconds that Nginx will wait for the client to complete the specified action.
client_body_timeout 5 is the read timeout for the request body from client. If after this time the client sends nothing, nginx returns error "Request time out" (408).
client_header_timeout 5 is the timeout reading the title of the request of the client. If after this time the client send nothing, nginx returns error "Request time out" (408).
keepalive_timeout 5 5 the first value is for keep-alive connections with the client. The second parameter assigns the value "Keep-Alive: timeout=time" in the header of answer.
send_timeout 5 is response timeout to the client. Timeout is established not on the entire transfer of answer, but only between two operations of reading, if after this time client will accepts nothing, then nginx is shutting down the connection.



Want more speed? Make sure to also check out the Network Speed and Performance Guide. With a little time and understanding you could easily double your firewall's throughput.


General Options :  
ignore_invalid_headers on throws away non-standard headers in the client request. If you do not expect to receive any custom made headers then make sure to enable this option.
limit_zone gulag $binary_remote_addr 1m sets up a table we will call "gulag" which uses no more than 1 megabyte of ram to store session information keyed by remote ip address. This directive is used in conjunction with limit_conn gulag 5. The ngx_http_limit_zone_module only restricts the amount of connections from a single ip address which are currently being processed by the Nginx daemon. An error 503 will be returned to the client if request processing is being blocked at the socket level and new requests from same ip starts. limit_zone will _NOT_ help if your workload is CPU or disk bound. With several workers enabled an error 503 will be returned if two workers process requests from the same ip at the same time. But this is unlikely to happen with small requests.
This is _NOT_ a directive to limit the total number of open, "established" connections to the server per ip address!! You could use your iptables or PF firewall to limit the total amount of connections. The OpenBSD PF firewall (pf.conf) uses max-src-conn or max-src-states to limit the amount of established connections to your server.
You can increase the size of the "gulag" table from 1 megabyte if you need to. A zone size of 1M can handle 32000 sessions at a default size of 32 bytes/session. You can also change the name of the table we called "gulag" to any string you want. We thought this was a good name due to Nginx's country of origin combined with the purpose of this directive.
The HTTP 1.1 specification, circa 1999, recommends that browsers and servers limit parallel requests to the same hostname to two. Most browsers comply with the multi-threading recommendation of the specification, although downgrading to HTTP 1.0 boosts parallel downloads to four. So most web browsers are effectively throttled by this limit on parallel downloads if the objects in the web page they download are hosted on one hostname. We set this limit to 5 so browsers can open 4 connections with one slot left over as a buffer. Download accelerators can open many hundreds of connections to download a file so this directive will help to alleviate abuses.
recursive_error_pages on allows the use of the error_pages directive specified later in the config.
sendfile on enables the use of sendfile(). This function can greatly increase overall system performance since sendfile() can do an entire data transfer without switching context. Enable if you allow downloads of large to medium sized files. You need to be careful about using sendfile if the file being sent has any possibility of being modified ( especially truncated ) while the operation is in progress since some very odd things ( like the process crashing ) can happen on some platforms.
server_name_in_redirect off turns off the server's ability to substitute the client supplied "Host" header with the virtual server variable "server_name" when a client is redirected.
server_tokens off turns off the nginx version numbers in the auto generated error pages. We do not want to display this information for security purposes.
TCP options :   These options say how we should use the TCP stack.
tcp_nodelay on TCP_NODELAY is for a specific purpose; to disable the Nagle buffering algorithm. It should only be set for applications that send frequent small bursts of information without getting an immediate response, where timely delivery of data is required (the canonical example is mouse movements).
tcp_nopush on If set, don't send out partial frames. All queued partial frames are sent when the option is cleared again. This is useful for pre-pending headers before calling sendfile(2), or for throughput optimization. As currently implemented, there is a 200 millisecond ceiling on the time for which output is corked by TCP_CORK. If this ceiling is reached, then queued data is automatically transmitted.
Compression :   These values tell nginx how to compress outgoing data. Remember that all files of the specified mime.type (gzip_types) are compressed in real time. On a P3 500MHz a 100KB HTML file takes 0.05 seconds (5 hundredths of a second) to gzip at level 9 compression (highest).
gzip on turn compression on.
gzip_static on; allows one to have pre-compressed .gz files served instead of compressing files on the fly. This is the most efficient method of serving compressed data. To use this option simply have a compressed copy of the same .html file in document root. For example, if we have the index.html file in place we will also have a pre-compressed index.html.gz file.
The following script will publish a compressed gzip file from a given html file. When you are done editing the html file execute this script to make a compressed copy ready for distribution. As soon as it is in place Nginx will serve it out. Also, make sure the date on the compressed .gz is always newer or equal to the original as Nginx will always serve out the most recent copy:
#!/bin/sh
#
## Calomel.org  publish_html2gz.sh
## usage: ./publish_html2gz.sh index.html
#
## Make a tmp copy of the original HTML file
cp $1 $1.tmp

## Remove the old gz if there is one
rm -rf $1.gz

## Compress the tmp HTML copy. Use the highest level 9
## compression and do not store dates or file names
## in the gzip header. BTW, if the compressed gz is
## larger then the original file a gzip will NOT be made.
gzip -9 -n $1.tmp -o $1.gz

## Clean up any tmp files
rm -rf $1.tmp

echo ""
echo "Verify files"
ls -al $1*

echo ""
echo "Compression statistics"
gzip -vl $1.gz
When Nginx sees the .gz file it will send this out to clients who accept compression instead of compressing the file in real time. Make sure you have built your nginx binary with the argument "--with-http_gzip_static_module". Execute "nginx -V" to see the compiled options.
gzip_buffers 16 8k allows 16 slots of 8k buffers used to respond to clients with a gzip'd response. This means the max size of our compressed responses can be no larger than 16*8= 128 kilobytes. By default Nginx limits compressed responses to 4*8k= 32 kilobytes. If you expect to return responses which compressed size is more than 32KB in size then increase the number of buffers (e.g. 16). The single buffer size of 8K can not be increased.
gzip_comp_level 9 compresses files to the highest compression level. Level 1 is the fastest/lowest compression and level 9 is the slowest/best compression. During testing the time difference between level 1 and 9 was around 2 hundredths of a second per file on a P3 500MHz.
Which compression ratio is right for your server? As a test we took a standard 68.3 kilobyte HTML file and compressed it on a AMD64 1GHz machine using gzip levels 1, 6, and 9. Level 1 compressed the file 61.5%, but Level 9 took twice as long to compress the file to 67.1%. Level 1 has the best compression to time ratio. Realistically, the times are so short we still suggest using level 9, or at least level 6 compression to save overall bandwidth. Today's computers are fast enough that a user is unlikely to notice slightly more CPU usage compared to longer download times.
gzip     ratio  time      compressed  uncompressed
level 1  61.5%  0m0.009s  26320       68372
level 6  67.0%  0m0.016s  22560       68372
level 9  67.1%  0m0.018s  22525       68372
gzip_http_version 1.0 allows the server to send compressed data to HTTP/1.0 clients. HTTP/1.1 clients use the proper headers so they can always ask for compressed data.
gzip_min_length 0 this means that nginx should compress all files no matter what the size. The value is the size in bytes. You can always set this value to something higher if you do not wish to compress small files.
gzip_types text/plain text/html text/css image/bmp are the only files types to be compressed. For example, JPG's are already compressed so it would be useless for us to try to compress them again. TXT and BMP files on the other hand compress very well at an average of 250% smaller. Smaller files mean less bandwidth used and less time to transmit the same amount of data. This makes your site "feel" significantly faster.
gzip_vary on enables the response header "Vary: Accept-Encoding". This way clients know that our server has the ability to send out compressed data.
log_format main :   is the log format of the web logs. This format is assigned to the variable "main" and can be used later in the http section. This format is fully compatible with standard log analyzing tools like Awstats, Webalizer and custom tools like the Calomel.org Web Log Sentry. We also have added two(2) more fields at the end of each log line. "$request_time" logs how much time the server took to generate the content and "$gzip_ratio" shows what X-factor the file was compressed by. A value of 2.50 means the file was compressed 250%.
access_log and error_log :   are the locations you want the logs to be placed in. In the access_log directive you can also use the buffer command. This will buffer the access log activity into ram and once the limit has been reached Nginx will then write the logs. This can save I/O stress and bandwidth on your hard drive. You will want to remove "buffer=32k" while testing else you will not see any log output until at least 32 kilobytes of data are ready to be written to the access_log file. 32K of logs input is approximately 150 lines. The info directive on the error_log will increase the verbosity of the logs to include the reasons that clients were denied access.
expires 31d :   says we want our pages to be expired from the clients cache in 31 days. Time in the Expires header is obtained as the sum of the current system time added to the time assigned in this directive. In effect, we are saying that pages are to be expired 31 days after they were accessed by the client. You can also specify a time in hours using "h". In the Nginx v0.7.0 release you can use the format "expires modified +1d" to set the expires header based on the modified time of a file. The expire header tag will tell clients they should keep a copy of the object they already downloaded for the specified amount of time. This saves a significant amount of upload bandwidth for you. Instead of clients going from page to page downloading the same picture banner over and over again, they can keep a copy locally and just get the changes on your site. Imagine a client getting 5 pages from your site. Each page has a banner that is 15KB. With expires headers enabled that client will only download the banner once instead of 5 times (15KB compared to 75KB) saving your upload bandwidth and making your site "feel" quicker responding.
limit_conn gulag 5 :   limits remote clients to no more than 5 concurrently "open" connections per remote ip address being processed by Nginx. See the complimentary directive limit_zone gulag $binary_remote_addr 1m above for more information about defining the "gulag" table.
listen 127.0.0.1:8080 default rcvbuf=64K backlog=128 :   tells nginx to listen on localhost (127.0.0.1) port 8080. The directive rcvbuf=64K buffers incoming data (sysctl net.inet.tcp.sendspace). rcvbuf can be decreased to as little as 1K, possible decreasing the probability of overflow during a DDoS attack. The directive backlog (sysctl kern.somaxconn) are the max number of backlogged client requests Nginx will process. If you server is quite busy you will want to increase this value. We listen on 127.0.0.1:8080 in order to use the redirection rules in iptables or in OpenBSD's pf packet filter firewall. The argument "default" says that this server {...} function should handle any client request sent to this port no matter the hostname (not used in the example).
root /var/www/htdocs :   is the location of document root on your server. This is where nginx will look for all files to be served out to clients.
server_name mydomain.com www.mydomain :   means this server {...} block will only answer requests that have "mydomain.com" or "www.mydomain" host headers. By default the hostname of the machine is used. We are expecting the client to ask for the correct hostname with the Host header, if not, the default server block with "server_name _;" returns an error 444 to the client. BTW, the server_name_in_redirect and server_name directives work in conjunction with each other.
SSL Options (only enable if you use a SSL certificate)   If you are interested in setting up a SSL certificate for encrypted traffic on your site then we highly suggest reading our Guide to Webserver SSL Certificates. Once you understand the details of SSL certs then you must build Nginx from source and enable the argument "./configure --with-http_ssl_module".
ssl on; Enables the use of the ngx_http_ssl_module once it has been built into the Nginx binary.
ssl_certificate /ssl_keys/mydomain.com_ssl.crt; This file is the combined certificate which contains both of the "crt" files signed and sent to you by your certificate authority. See "How to setup a GoDaddy Turbo SSL Certificate on Nginx" below for details.
ssl_certificate_key /ssl_keys/mydomain_ssl.key; Specifies the location of the file with the secret key in PEM format for this server. This file is the public certificate secrete key you made using the OpenSSL binary.
ssl_ciphers HIGH:!ADH:!MD5; says that our server will only accept SSL handshakes (pre-master) using AES 128/256 bit or 3DES 168 bit encryption at strong crypto cipher suites without anonymous DH. MD5 is also not accepted due to its know weaknesses. Both AES and 3DES are both enable for browser and BOT compatibility. The command "openssl ciphers -v 'HIGH:!ADH!MD5:@STRENGTH'" will show you all of the ciphers your version of OpenSSL supports in the HIGH level and not anonymous DH. Our Guide to Webserver SSL Certificates explains all of the details about ciphers and compatibility models.
ssl_prefer_server_ciphers on; just means that our server will use the ciphers specified in the "ssl_ciphers" directive over the ciphers preferred by remote clients. It is not a good security practice to ever trust remote clients.
ssl_protocols SSLv3; tells the server to only allow SSL version 3 protocol (SSLv3). It is highly recommended never to use SSL version 2 (SSLv2) as it has vulnerabilities due to its weak key strength. TLS version 1 (TLSv1) is available, but Firefox just recently announced a weakness in the browser's negotiation with a TLS enabled server. Search Google for "mozilla tls vulnerable" for examples. For security we will only allow the SSLv3 protocol.
ssl_session_cache shared:SSL:1m; allows Nginx to cache the SSL session keys in its own cache structure instead of using OpenSSL slower, single threaded cache. This means Nginx can now take advantage of multiple worker_processes and separate the SSL jobs between them. The result is an impressive speed boost (2x or more) over the slower OpenSSL cache depending on your OS. The format of the line "shared:SSL:1m" is as follows: "shared" is the internal caching function, "SSL" is just an arbitrary name of this SSL cache (you can name it anything you want) and "1m"is the size of the cache (1 megabyte can hold around 4000 SSL cache sessions).
ssl_session_timeout 5m; is the cache session timeout between the client and the server set at 5 minutes or 300 seconds. When this time runs out the clients ssl session information is removed from the "ssl_session_cache". The reason this number was chosen is it also the default amount of time all client browsers will cache an ssl session. If you expect client to stay on your site longer and go to multiple pages you can always increase this default value. Depending on the client browser, they may or may not respect your ssl_session_timeout value if it larger than 5 minutes. 5 minutes seems to work perfectly fine for most sites.


How to setup a GoDaddy Turbo SSL Certificate for Nginx
  • Goto your SSL Certificate Authority company of choice (GoDaddy Turbo SSL Certificates sell for little as $14 a year) and purchase it. You will be asked to upload a Certificate Signing Request (CSR) by GoDaddy. For full instructions on how to create your own highly secure CSR, check the Guide to Webserver SSL Certificates. When you provide GoDaddy with your CSR, you will be asked to download your signed certificates. IMPORTANT NOTE: Make sure to choose the "APACHE" certificate type when asked what format to download in; else you will not receive any files.
  • When you receive the certificate file it will probably be compressed in ZIP format. In the zip file will be two files: one named with your domain like "mydomain.com.crt" and one called "gd_intermediate_bundle.crt". You will need to combine both of the files for Nginx to understand them. Cut/paste or "cat" your certificate file first. Then Cut/paste or "cat" the gd_intermediate_bundle.crt into a single file. In our example above we called the combined file "mydomain.com_ssl.crt" and it is used with Nginx's ssl_certificate directive.
  • Finally, make a directory outside of the web document tree and limit access to it to root and the Nginx web daemon only. In our example we will make a directory called /ssl_keys. Both your public "key" file generated by OpenSSL (mydomain_ssl.key) and the combined certificate "crt" file (mydomain.com_ssl.crt) are copied to /ssl_keys. It is a good idea to make both file read only by the Nginx daemon.




IMPORTANT NOTE: Why are we using the error code "return 444" ?

In the following sections you will notice we are using the directive "return" with an error code 444, i.e "return 444". This is a custom error code understood by the Nginx daemon to mean, "Close the connection with the client without sending any headers."
Nginx understands that if you want to send the client the 444 error code (only 400-417 and 500-505 are allowed by RFC) then Nginx should just close the connection to the client. Just dropping the connection and sending an empty TCP reset packet to the client will deny scanner information about the server and _may_ confuse them.
So what is the difference in using an error 404 compared to an 444? Lets take a look at the header results using the cURL command to a server sending a 404 error. FYI: cURL will be sending the HEAD request to the server. Notice the server sends back a set of headers with the error code and some information about the server.
user@machine: curl -I http://www.somedomain.com/
HTTP/1.1 404 Not Found
Server: Nginx
Date: Mon, 10 Jan 2010 20:10:30 GMT
Content-Type: text/html
Transfer-Encoding: chunked
Connection: keep-alive
Keep-Alive: timeout=5
This is what cURL says about a Nginx server returning an error 444. The server sent nothing back to the client and closed the connection. The client did not get any useful information about the server. If you are paranoid about security or just do not want to provide and data to clients who cause errors, this is a good response for them.
user@machine: curl -I http://www.somedomain.com/
curl: (52) Empty reply from server
If you wish to change the return code to those found in the error_codes directive then your error page will be sent out instead. For example, instead of using code 444 you could send a 403 (Forbidden). For a full list of the error codes and their official definitions check out the w3.org Error Status Code Definitions.




Directive explanation and insight

Only requests to our Host are allowed :   This condition is to make sure that only clients who are asking for mydomain.com or www.mydomain.com are allowed access to our server. If a client is scanning web servers then they might ask for the ip address and this is NOT allowed by our rules.
Only allow GET and HEAD request methods :   Request Method restrictions allow you to filter on GET, HEAD, POST, SEARCH, etc. We will be limiting access to our example server to GET and HEAD requests only as we do not allow uploads or any other options due to security concerns. All other request methods will get an error defined by "return 444".
Deny certain User-Agents :   You may want to list out some user-agents you do not want connecting to your server. They can be scanners, bots, spammers or any one else you find is abusing your server.
Deny certain Referers :   Referer spam is more of an annoyance than a problem. A web site or bot will connect to your server with the referer field referencing their web site. The idea is that if you publish your web logs or statistics then their hostname will show up on your page. When a search bot like Google comes by it will see the link from your site to theirs and give the spammers more PageRank credit. First, never make your weblogs public. Second, block access to referer spammers with these lines.
Redirect from www to non-www :   is if you prefer clients who connect to your site to instead use the non-www domain. For example, if a browser connects to www.mydomain.com they will be redirected to the URL mydomain.com with a code 301. If they then save your site location in a bookmark it will show up as the preferred non-www domain.
Stop Image and Document Hijacking :   Image hijacking is when someone makes a link to your site to one of your pictures or videos, but displays it on their site as their own content. The reason this is done is to send a browser to your server to use your bandwidth and make the content look like part of the hijacker's site. This is most common as people make links to pictures and add them to a public forum or blog listing. They get to use your picture in their content and not have to use their bandwidth or server to host the file. In order to keep your bandwidth usage low you should block access to images from those clients who are not referring the content from a page on your site. Note, this function can be used for any kind on content. Just add the file types to the list. If would like more ideas on lowering bandwidth usage check out our Saving Webserver Bandwidth (Tips).
Restricted Access directory :   This area is to limit access to a private or content sensitive directory. We will be limiting access to it by ip address (first check) and if that passes then ask for a password (second check). Both must match before access is granted.
access control list :   This is a way you can define a directory and only allow clients coming from the specified ips to have access. Use this function to allow internal LAN clients access to the status pages or employee contact information and deny other clients. In our example we will allow the clients coming from localhost (127.0.0.1/32) and internal LAN ips 10.10.10.0/24 to access the protected "secure" directory. BTW, if you use OpenBSD's pf packet filter firewall we highly suggest enabling "synproxy" in your pf.conf for all connections to your web server. Normally when a client initiates a TCP connection to a server, PF will pass the handshake packets between the two endpoints as they arrive. PF has the ability, however, to proxy the handshake. With the handshake proxied, PF itself will complete the handshake with the client, initiate a handshake with the server, and then pass packets between the two. The benefit of this process is that no packets are sent to the server before the client completes the handshake. This eliminates the threat of spoofed TCP SYN floods affecting the server because a spoofed client connection will be unable to complete the handshake.
password protected area :   If you are coming from an authorized ip address then we will ask for a username and password. If you have an area of the web site you only want authorized personnel to see then you should protect it. This set of directives will password protect the directory "/secure" and all files and directories under it. We will use the basic method which is the authors best choice to use especially for non-https sites. It will not send any of the passwords in clear text. Check "man htdigest" for details.
To make the "access_list" password file use the binary htdigest in the following form. Supply the "username" and "password" pair for access. Remember that our configuration file will look for this file at /var/www/htdocs/secure/access_list :
htpasswd -b -c access_list username password
Only allow these file types to document root :   We want to restrict access to our server to clients who are actually looking for data we serve out. For example, if a client is asking for a PHP file and we do not serve that type of file then we want to deny them access.
"(^\/|\.html|\.css|\.jpg|favicon\.ico|robots\.txt|\.png)$" matches the incoming request. If a request fails to match than this service will be skipped and the client will get an error. If all services fail to match Nginx returns a generic error (next section). The example URL string specifies the file types we expect a client to want to retrieve. The dollar sign ($) says that all the strings listed must be located at the end of the request URL. This line will allow:
  • ^\/ allows the root request http://mydomain.com/ to be accepted. / is expanded into /index.html by the web server
  • \.html HTML page files
  • \.css Cascading Style Sheets
  • \.jpg JPG pictures
  • favicon\.ico is the only .ico file
  • robots\.txt is the only text file
  • \.png PNG pictures
  • $ says that each of these strings have to be located at the end of the line
Serve an empty 1x1 gif _OR_ an error 204 (No Content) for favicon.ico :   Using either of these lines simply direct the Nginx daemon to serve out an empty 1x1 pixel (43 byte) favicon.ico file to the client or send back a return code 204, meaning "No content". When either option is in place you do not need a file called favicon.ico in your document root. This is perfect for anyone who sees the favicon.ico as useless and does not want to waste any bandwidth on it, but also do not want to see "file not found" errors in their logs. For more information make sure to read the section titled, "The Cursed favicon.ico" on our Webserver Optimization and Bandwidth Saving Tips page.
System Maintenance :   This function will look for the file /system_maintenance.html in the document root. If the file exists then _ALL_ client requests will be redirected to this file with an error code 503 (Service Unavailable). If the file does not exist then your pages will be served as normal. The purpose is so that you can work on your site and still keep the Nginx daemon up to show helpful information to your users. The system_maintenance.html file can contain something as simple as "Site Temporarily Down. Be back up soon." It is vitally important to send clients an error code 503 in order to notify them that the page they are looking for has NOT changed, but that the site is temporally down. Google for example understands this error and will return to index your page later. If you were to redirect Google to the /system_maintenance.html page and send them a code 200 for example, Google might replace the indexing information they have with this page. Your site would then have to be completely re-indexed once you got your site back up. Definitely not what you want to happen.
All other errors get the generic error page :   If the client fails the previous tests then they will receive our error_page.html. This error page will be sent out for all error codes listed (400-417, 500-505). Note the use of the internal directive? This means that an external client will not be able to access the /error_page.html file directly, only Nginx can serve this file to a client.




Starting the daemon

Make sure that the user and group Nginx is going to run as exists and can access the files in document root. Our example file will run the child daemons as the user "nginx" and the group "nginx".
Now that you have the config file installed in /etc/nginx.conf and configured you can start the daemon by hand with "/usr/local/sbin/nginx" or add the following into /etc/rc.local to start the Nginx daemon on boot.
#### Nginx start in /etc/rc.local
if [ -x /usr/local/sbin/nginx ]; then
   echo -n ' Nginx'; /usr/local/sbin/nginx
fi




In Conclusion

Nginx has many more options not covered in this how to. We highly suggest taking some time to read the Nginx English Wiki if you need more web functionality. If you are happy with the options we have started out with then at this point all that is left is finding some content to serve and setup your web tree.




Strip Unnecessary White space like spaces, tabs and new line characters

How much bandwidth can you expect to save by stripping out white space? On average , one could expect to save 2% of the total size of the HTML pages written by hand or previously unoptimized. If your average page size is 100 kilobytes you could save around 2 kilobytes every time they were served. If you served a hundred thousand pages per day you could reduce your bandwidth usage by 200 megabytes per day. Not a lot, but every bit makes a difference.
This is a simple perl script called "strip_whitespace.pl". It will read in any html file and output the stripped version. Use this script to make a published copy of your html docs while keeping the human readable versions in a private directory. BTW, we use this code with the pre-compression option in Nginx to serve out pre-stripped, pre-compressed files to save on bandwidth and CPU time.
#!/usr/bin/perl -w
#
## Calomel.org -- strip_whitespace.pl
#
## PURPOSE: This program will strip out all
##    whitespace from a HTML file except what is
##    between the pre and /pre and tags.
#
## DEPENDANCIES: p5-HTML-Parser which you can get by CPAN or
##    installing a package from your OS supporters.
#
## USAGE: ./strip_whitespace.pl < input.html > output.html
#

use HTML::Parser;
my $preserve = 0;

# Ignore any test between the /pre tags
sub process_tag
{
    my ($tag, $text) = @_;
    if ($tag eq 'pre') { $preserve = 1; }
    elsif ($tag eq '/pre') { $preserve = 0; }
    print $text;
}

# Replace all white space with a single space except what
# is between the pre tags. This includes all tabs (\t),
# returns (\r) and new line characters (\n).
sub process_default
{
    my ($text) = @_;
    $text =~ s/\s+/ /g unless $preserve;
    print $text;
}

undef $/;
$_ = ;

my $p = HTML::Parser->new(
    start_h => [\&process_tag, 'tag,text'],
    end_h => [\&process_tag, 'tag,text'],
    default_h => [\&process_default, 'text']
);

$p->parse($_);

## EOL

No comments: