Web server optimizations : ETAGs

This is nothing new, but just in case you missed it, Yahoo! published a fairly detailed report about how to speed up your website response times : Best Practices for Speeding Up Your Web Site.

Many of the tips are common sense, some are somehow unexpected, and some I didn’t know like the HTTP/1.1 header ETAG. Let’s see what that’s about.

An ETAG is a property which is sent to the web browser by a webserver along with a static file, and uniquely identify a file. This is comparable to a checksum of the file, but different in the way it is computed. For example, Apache computes the ETAG of a file based on the size, the date, and the inode number of the file (a checksum would be based on the actual content of the file, and as such would yield a bigger resource consumption to compute).

That way, the first time a web client fetches a static file from your webserver, it will cache it along with its ETAG. Next time it will request the same content, the client will provide the known (cached) ETAG in the request. The webserver has a chance to compute the ETAG of the file and compare it against the one provided with the client.  If they match then the file didn’t change and the client has a valid copy of the file in its cache. The webserver doesn’t need to send the file over again and can answer a 304 HTTP/1.1 status code (“Not modified”).

This is an efficient way to save bandwidth, but this has a drawback in case your website generates a large traffic and is hosted on a web server farm. With the standard setup of Apache, the ETAG depends on the file inode amongst other things. This is of course different on each server of the farm, and so the same file can be downloaded again and again, every time a client gets connected to a different server, even if the file content is actually the same …

In this situation, the simplest solution is to disable the ETAG feature on the webservers. This will force the web clients to rely on other cache control features such as the “Last-Modified” which means that a file in cache is still valid if it hasn’t been modified since the date it was placed in the cache.

Anyway, you might have to do some testing to find out which setup suits you best, but it is good to know what are ETAGs.

References :

  1. ETAGs setup in Apache
  2. Yahoo!’s guide to improve performance of websites and busy webservers

One thought on “Web server optimizations : ETAGs”

Comments are closed.