Apache Basics and simple CGI scripts
Contents
Motivation
The web is ubiquitous today. Everything comes with built-in web servers and there are many free/open source web-servers available. For a long time Apache was the most used web server and it is still rather prominent today. It has recently been surpassed by nginx[1] (pronounced: engine-x). Still the versatility of Apache and the time-tested security makes it a good choice for many applications.
Here you will learn the basics to get you started with running an apache web server, yet many of the concepts will apply to any web server.
Appache Configuration
Global Configuration
Apache has one main configuration file. This is depends on how apache was started and the default is dependent on your Linux distribution. E.g. in debian it is /etc/apache2/apache2.conf and in redhat it is: /etc/httpd/conf/httpd.conf
Within the main configuration there are usually Include statements that include other files. E.g.
IncludeOptional sites-enabled/*.conf
Which would include any .conf file from the sub-directory sites-enabled. Usually the configuration is split up into many files. E.g. one for each module that is included and one for each virtual web server that is hosted.
Independent whether this is in the main config file or in included files. Some directives are global: They change parameters of the server itself. E.g.
Listen 80 Listen 443 Listen 127.0.0.1:9980
The above would tell apache to create listen sockets of 443 and 80 on all ports and one additional port 9980 that is only available on the localhost.
Modules
A lot of functions of Apache are placed in modules. Those functions then will only be available if the module is loaded. You can use the a2enmod to enabled the modules. (This will place links form the directory where the module load files are to the directory that is actually included in the config).
# a2enmod proxy # cd /etc/apache2/ # ls -l mods-enabled/proxy.conf lrwxrwxrwx 1 root root 28 Apr 1 12:37 mods-enabled/proxy.conf -> ../mods-available/proxy.conf
Virtual Web Servers
A web server without encryption answers on port 80. If you have https encryption then it answers on port 443. If you have more then one IP you can choose which IP address the socket binds
to.
When a web client connect they will ask for the URI part (the part behind the host name) but after the request the host name that should be sent is also transmitted (in http/1.1 requests). Thus the server can present different web pages depending on the host name.
So the server can discern what the client wants, either be the IP address and/or by the host name that the client requested. So we speak of IP-based and name-based virtual hosts.
With https protected services there is a little chicken-and-egg type problem: When the SSL connection is established the server needs to present the certificate for that server. If it has more virtual servers it does not know which, since the host name is only sent within the established session. To avoid this and to allow more then one virtual server with https protection on the same IP address the SNI was invented. SNI (server name indication) is supported by all modern browsers. With SNI the server name is already sent within the SSL handshake.
<VirtualHost 10.11.12.13:80> ServerName www.test.example.org ServerAlias test.example.org ServerAdmin admin@example.org DocumentRoot /var/www/testsever/ ErrorLog /var/log/apache/test.error_log TransferLog /var/log/apache/test.access_log RedirectPermanent /wuwien http://www.wu.ac.at Alias /projectdata/ /home/anna/projectx/data/ </VirtualHost>
The above example defines a virtual host (you might want to place that in its own config file - but it works in the main file as well). The virtual host is on that private IP 10.11.12.13 and accepts requests on port 80. This configuration will only be used if the hostname that is sent matches the name in ServerName or ServerAlias. Documents will be served from the DocumentRoot and the config specifies the location of the log files.
If someone browses to http://www.test.example.org/projectdata/ they will actually see what is in the /home/anna/projectx/data/ directory - but only if the user that the web-server uses has permissions on that directory.
You could specify many other directives within that block. One exmaple here is the RedirectPermanent. If a uses goes to http://www.test.exmaple.org/wuwien they will be redirected to another server.
Directory and Location Configuration
Often we want special settings that only apply to one directory (where the files are on the server) or one location (the part specified in the URL).
For this you can specify settings that are only valid in these directories. Of course this can be nested within VirtualHost blocks. E.g.
<Location /server-status> SetHandler server-status </Location>
This would tell apache to server server-status pages (If the module is enabled) under the URI /servers-status.
In most cases it is better to use Directory. E.g.
<Directory "/opt/some/data/"> Options -Indexes AllowOverride AuthConfig </Directory>
The above example turns off the indexing
of directories. (That is: if you browse to a directory instead of a file, then apache can create a listing of the content. This is turned off here).
It also says that the AuthConfig can be specified in a different place: In so called .htaccess files:
The AuthConfig specifies if you need a password to access the web page.
.htaccess files
When you place a file with the name .htaccess in a directory you can change some settings of the configuration just for that directory (and sub-directories). This only works if the class of settings that you want to change is allowed to be changed there. See the above example. Most of the time this is used to password protect access:
AuthName Streng-Vertraulich AuthType Basic AuthUserFile /opt/myapp/webusers require valid-user
The above apache directives tell the server that for access it should ask for a password. In the password Dialog "Streng-Vertraulich" is told to the user. The users and passwords are checked against the given file. Any user in that file has access.
$ touch webusers $ htpasswd -B webusers anna New password: Re-type new password: Updating password for user anna cat webusers anna:$2y$05$amEPdHfhgbggHblFGUx2ZeuVGNFKbSZoc1kamltBZJrj.YoX1YEwW
The above create a file (if it does not exist yet) with the touch command. the htpasswd tool is then used to create a user nammed anna in that file. The password is read interactively. The -B option tells the tool to use the secure bcrypt algorithm for password hashing. For each user in the file there is a line with the format user:hashed-password.
Useful Apache Features
Reverse Proxy
Here are a few of the many features of Apache that might be useful. You can use it as a reverse proxy where incoming requests are passed onto a totally different server and presented as if they where located on your server. E.g.
ProxyPass "/bilder/" "http://www.example.com/img/" ProxyPassReverse "/bilder/" "http://www.example.com/img/"
The above lines would present the files ander img on the www.example.com server as if they are on the local server within /bilder. The ProxyPassReverse rewrites some redirects.
Connecting to Scripts
In order to create dynamic content, the Apache server can include scripting languages that are directly executed in he context of the server (see below) or can call CGI script (where one script is executed for each request) or can connect to various services. E.g. FastCGI servers, WSGI servers. In a lot of cases Languages include their own web sever and so the ProxyPass above is all that is needed to connect to other applications.
Including Script Languages
PHP, Python, Perl and other languages can be included in Apache so that pages can directly execute code in that language.
CGI Scripts
One of the oldest and easiest way to create dynamic content on the server is the use of CGI (Common Gateway Interface). CGI scripts are scripts (or compiled programs) where, for every request, the script is started and details of the request are passed to the script via environment variables.
In the Apache configuration you need an entry like this:
ScriptAlias /mycgi/ /opt/mycgi/cgi-bin/ <Directory /opt/mycgi/cgi-bin/> AllowOverride AuthConfig Options +ExecCGI -Indexes Require all granted </Directory>
With this you could use the /opt/mycgi/cgi-bin/ as a directory where you store cgi scripts then can be exectued by the server. (It needs permission for the web server - or all - to execute the scripts. On a typical debian installation the /usr/lib/cgi-bin is already configured for cgi scripts. You may need to enable the cgi module. a2enmod cgi
Your script could look like this:
#!/bin/bash echo Content-Type: text/plain echo echo hello, today is: $(date) echo we are running under user $(id)
If this is saved as test.cgi then you can surf to http://example.com/mycgi/test.cgi
The first 2 lines are dictated by the CGI standard. The script must tell the server which type the document is and then an empty line. In our case we have plain text.
Exercises
- Try to install Apache using the package managment. Look at the existing config.
- Try to see where the cgi directory is or try to enable the cgi module.
- Write a short CGI module and see if it works with your webbrowser.