svasey.org http documentation

This describes the software I use to maintain my web page, as well as all the other http-served domains. This explains design decisions I have made, as well as other details.

Web server setup

The web server I use is lighttpd. This has been favored over Apache because I wanted something simpler to use and learn.

Everything is setup in /home/www. The user that serves the pages is http, the group used is http. However, everything in /home/www is owned by http:http-admin, where sebastien is also a member of http-admin. Both http and members of the http-admin group have read/write access. Other users have no rights.

/etc/lighttpd contains lighttpd configuration files:

  • lighttpd.conf contains most of the configuration options
  • mime.types.conf contains the mapping from filename extension to mime type
  • variables.conf contains some user-defined variables that would not be visible enough in lighttpd.conf
  • trusted-conf.conf contains the path to several domain-specific configuration files. Their content will only be included if a given domain is asked for by the client.

The /home/www hierarchy

/home/www contains three subdirectories:

  • build is where the final content is copied to be served by the web server. You shouldn't care about it too much. Consider it a directory used only by utility programs.
  • conf contains only global (valid over multiple domains) configuration instructions for lighttpd.
  • sites contains the hierarchy of all the websites to be served, as well as the custom error pages to be served when a page does not exist.

conf contains the redirect.conf file, specifying cross-domain redirections, e.g redirection from svasey.com to svasey.org, or from www.svasey.org to svasey.org.

Organization of sites

The basic model is as follows, when the user requests a file in a domain dom1.dom2.dom3.tld (e.g if the requested domain is foo.bar.svasey.org, dom1 would be foo, dom2 would be bar etc...), it is served the content in /home/www/sites/dom3.tld/sub/dom2/sub/dom1/pages. Most of the time pages is a symbolic link pointing to the site's content somewhere in /home/www/build/.

If the path above does not exist, this most likely means the domain does not exist or no http content is served for that domain. In that case, the content served is in /home/www/sites/default/pages, and will probably just be a simple html page indicating an error.

Of course, if the domain is dom2.dom3.tld, the content served would be at /home/www/sites/dom3.tld/sub/dom2/pages. For completeness, if the user asks for an IP, the same system is used, and the digits of the IP are considered as domain names, i.e if the ip is 1.2.4.8, dom1 is 1, dom2 is 2 etc...

In short, the format is always the same: $domainpath/pages. There are other directories in $domainpath:

  • sub/ contains the subdomains
  • conf/ contains lighttpd configuration files used only for the given domain. If they are specified in /etc/lighttpd/trusted-conf.conf, they will be used when the specific domain is queried. They are usually used for redirection or authentication. Any conf/ directory must contain at least one empty file named FAKE.
Authentication

By default, all web pages are public. It is possible to restrict some website to specific users using a basic http authentication. It is widely known that this is not secure if the information is transmitted in plain view though, so I try to use it only with my SSL websites.

To use authentication for a given website, it is enough to create a file named htpasswd in the site's conf/. This file should contain one username/password pair per line, with the passwords encrypted using MD5. The htpasswd utility can help creating those files.

If an htpasswd file exists, then the entire website will be password-protected. This is done using the lighttpd-htpasswd helper program

To test this feature, I use https://test.secure.svasey.org, with the test username and the same password.

Static or dynamic content ?

Most of the public content I serve is static HTML (the page is not generated by the server everytime a client asks for it). My pages are simple enough to allow this, and even though it is not as simple to manage as when using the dynamic approach, this has several advantages:

  • No additionnal software (PHP, mysql or postgres...) to install on the server
  • Faster and more predictable serving time
  • Easy caching or compression (mod_compress) of the pages
  • Easy offline browsing without installation of any software
  • No security flaw possible, except at the web server level

Helper programs

I use four small helper programs to manage the hierarchy.

lighttpd-htpasswd

This is used in the lighttpd configuration file to print all the authentication directives: this basically finds all the file with name htpasswd in the hierarchy and prints configuration directives so that lighttpd uses them. This is a very simple program that takes no command line arguments.

lighttpd-addconf

This is used in the lighttpd configuration file to include other configuration files via trusted-conf.conf (using the include_shell command).

The usage is as follows:

lighttpd-addconf [file1 file2 ...]

where the arguments are configuration files to include, in the order they are to be included. The program will display those files to standard output if they are valid lighttpd configuration files.

To check the syntax of your configuration file, the program will need to see the main http configuration file. By default, its path is assumed to be /etc/lighttpd/lighttpd.conf, but you can change this with the --mainfile option.

You can also use the --config=CONFIGFILE option to read the configuration files from CONFIGFILE, which must contain one file path per line (the path must be relative to CONFIGFILE); lines beginning with # are ignored. The files given will be included after those given on lighttpd-addconf's command line, in the order they are given in CONFIGFILE.

Assuming your files are organized in the same way as mine, you can use the --condhost option to make sure the content of a configuration file in /home/www/sites is included only if its domain is asked for by the client.

For example, if your configuration file is in /home/www/sites/svasey.org/conf/redirect.conf, lighttpd-addconf will output something like:

$HTTP["host"] =~ "^(svasey\.org)[.]*$" {
    $contentoffile
}

where $contentoffile is the content of your configuration file.

website-install and website-remove

I use those to install, remove and update website content on my server. They assume everything is configured exactly as said in that documentation.

website-install takes two arguments: the path to the website's content (containing conf/ and pages/ directories), and the domain name on which the website should be installed. Note that this will not install any subdomain.

website-install takes two command line options:

  • --to-copy: used to specify additionnal items to be copied into the final website. This is especially useful when you cannot integrate some part of your website into the simplewp hierarchy. Trac is an example: I initialize the environment after installing everything else, directly into the existing website. This means I have to use --to-copy whenever I modify the website; otherwise the trac environment will simply not be considered and get removed. --to-copy can be given multiple times in order to copy multiple directories.
  • --wpcompile-opts: used to give additionnal, space-separated, options to wpcompile.

website-remove takes one argument: the domain name of the website to be removed. Not that this will not remove any subdomain.

Adding a new website

Assume you want to add a new website at example.com.

First of all, add example.com in your DNS entries

You then have to edit the hierarchy in the http-config repository so that a directory entry appears at $domainpath. This should only contain a conf directory with an empty file named FAKE in it.

Optionally, add an entry in etc/lighttpd/trusted-conf.conf for the configuration files of that website.

Install the new http-config and do something like:

website-install /path/to/content example.com

That's it, your website is installed. To remove it, use:

website-remove example.com

Large file handling

To avoid long upload time, I use a special site, download.svasey.org to host "large" files (e.g software tarballs). The files are uploaded using rsync, and hence do not need to be re-uploaded if another file changes.

Concretely, this is implemented by using a www-download user on the server, with a home at /home/www-download. At /home/www/sites/svasey.org/sub/download, pages is just a symbolic link to //home/www-download/pages, which contains all the files.

The SSH configuration is to the configuration for publishpkg.

To upload or download files, I use a script called svasey-sync. It takes one argument: the source tree to synchronize, and several options:

  • --download: by default, svasey-sync will upload the files in the local tree to download.svasey.org. This option must be given if you want to do the inverse operation.
  • --host: Specify a different host than download.svasey.org
  • --root: Give the remote path to synchronize with. The default is /pages
  • --user: Give the remote user on the remote host. The default is www-download

The package www-download implement the web-server part, whereas www-download-config implements the client and server-side SSH configuration.

Secured Socket Layer (SSL)

SSL can be used anywhere on svasey.org, but it is only enforced on subdomains of secure.svasey.org. See my SSL documentation for more.

Trac

Trac is installed on my server, and is used on some (private) sites. See my Trac documentation for more.

Statistics

I use awstats to analyze my log files. Those files are deleted after one month, but the awstats data will stay longer (up to two years). Those times must be interpreted only as some policy I am trying to respect: you shouldn't trust me. Try not to leave any personally identifying information when you visit my site.

Internals

I run a script daily to analyze my log files from last month, update the awstats database, and generate static html files summing up all the information.

Here are the main directories:

  • /var/lib/awstats/ contains the awstats database
  • /home/awstats/ contains all the static html files. It is owned by http:http.
    • pages/ contains the actual html files. The pages.1/ and pages.2/ directories are used internally to provide atomicity.
  • /etc/awstats/ contains the awstats configuration files:
    • awstats-www-common.conf contains settings global to all sites
    • awstats.*.svasey.conf is the generic format for a site-specific configuration file. It should begin with Include "awstats-www-common.conf"
  • /home/www/sites/svasey.org/sub/secure/sub/stats/pages/ is the web interface to browse my statistics.
    • stats/ is a symbolic link to the /home/awstats pages.

Web interface

A password protected interface can be accessed at stats.secure.svasey.org. It is implemented in the www-stats package.

Updating the stats

The script awstats-update, from the awstats-config package, is run to update the awstats database. The command takes two options:

  • --reset can be used to reread all the logs from scratch and remove all the existing data.
  • --awstats-args=AWSTATSARGS can be used to pass additionnal space-separated arguments to awstats.

awstats-update is run daily, and also 30 minutes after log rotation.

Testing

Automated tests are run at regular interval e.g to control that my sites have no dead links. I use svmonitor to run them. They are detailled below.

MIME type test

This is run by the checkmime tool from http-helpers. It checks that a MIME type is defined for all my files. This is done by looking up a list of known extensions in /etc/lighttpd/mimecheck.conf, and making sure no file with another extension hang around.

The /etc/lighttpd/mimecheck.conf file is a simple INI-style configurationb file. Two variables can be given:

  • known_extensions = .html .pdf .tar.gz README ...: give a list of space-separated extensions to consider as known. If it does not begin with a dot, the extension is taken to mean "the whole filename".
  • exclude = /first/path /second/path2 ...: give a list of domain path to exclude from the search. They should be given as absolute path and be a domain directory, i.e a directory containing the sub and pages subdirectories.

The mimecheck command itself takes no argument and returns 0 if the test was successfull, 1 if it failed, outputting the problematic files to stderr.

List of all my websites

Because of the SSL certificate I use, my HTTP-served domain names should either have

  • only three or two levels (svasey.org and test.svasey.org are okay, but not test.test2.svasey.org)
  • four levels and be a subdomain of secure.svasey.org, i.e test.secure.svasey.org is okay, but not test.test2.secure.svasey.org

Plain-HTTP sites

SSL (secured) sites

Packaging

Everything, including website content, is put in the following arch packages

  • http-config contains lighttpd's configuration files and the directories of the /home/www hierarchy, including all valid http domains. It does not include domain-specific configuration files (although it includes /home/www/conf/redirect.conf), and pages symbolic links. It does include the complete content of the test.svasey.org and test.secure.svasey.org websites, which are just used for testing.
  • http-helpers contains website-install and website-remove
  • lighttpd-utils contains lighttpd-addconf
  • www-svasey contains the content and configuration files for svasey.org
  • http-certs contains the content and configuration files for certs.svasey.org
  • www-documentation contains the content and configuration files for documentation.svasey.org and sv-documentation.secure.svasey.org
  • trac-config contains global trac configuration information
  • www-trac contains the content and configuration files for trac.secure.svasey.org