CGI@CLAM
week #11

How did you do last week's assignment?

I sat around and goofed off, since we didn't have an assignment last week! I spent lots of time in the garden too.

Does every page with dynamic content have to be a CGI program? Sometimes the dynamic part of the page is just the last little piece.

Remember the assignment from week #9? (If you haven't done it yet, you can check out the solution here.) It can be pretty useful to be able to see who's visited your home page last, but it seems a bit overkill to have your entire home page, which is probably 99% static content, be a CGI program. If you have a huge lame home page like, mine, you will waste quite a large chunk of CPU time printing out each individual statement of HTML with perl - plus, your page will be a giant print statement, which is really neither elegant nor easy to edit.

What you really want to do is simply paste a bit of output from a separately existing CGI program or even a shell command that generates the small bit of dynamic content into your page. Occasionally you might also like to simply include some boilerplate static text from a separate file into your page, perhaps including a nicely formatted list of contacts or a single copyright string for your pages. Perhaps you want to simply include the value of some common system variables, such as the time, into your page. One way to do all of these things is to use server side includes, or SSI.

What is a server side include?

Ordinarily, when you ask the HTTP daemon for a static file (a normal HTML document, image, or the like) the daemon simply retrieves the file, sends the appropriate HTTP headers to your browser, and then dumps the content of the file to your browser. The daemon doesn't bother to look inside the file or parse its contents.

You can, however, set up the server (using the httpd.conf file, in the case of apache) so that the daemon will parse the files follow specific directives contained inside, and include the output of those directives in place of the directives themselves in the final output page. These directives are known as server side includes - because the server goes off and does something, and then includes the results. The end result is much like having only part of the page be a CGI program.

Note that you can't print these server side include directives in CGI output (or they won't work, anyway). The daemon won't parse your file TWICE!

How does the daemon know which files it has to parse and which ones it can simply return blindly?

In order for the server daemon to know which files to parse, you need to give them a special suffix. This suffix may be anything you like, but you must set it up in the server configuration files. On the clam server, the suffix has been set to .shtml (this is actually a popular choice - another popular one is .phtml). To set this up in apache, you'll need to edit the httpd.conf file to:

The relevant lines from the httpd.conf file on clam are:
	# To use server-parsed HTML files
	AddType text/html .shtml
	AddHandler server-parsed .shtml
You might also want to allow a .shtml file to serve as the default index file (the file that comes up when a URL ends in a directory). To do so, simply add index.shtml (or any other favorite file name) to the list of allowable DirectoryIndex files, as follows:
	DirectoryIndex index.html index.shtml index.cgi

Of course, if you like, you could set the special suffix to .html. This would result in all files ending in .html being parsed. While there is nothing technically wrong with this, it will lead to a lot of unnecessary server load, as the vast majority of files ending in .html probably do not actually contain any server side includes.

Do the files containing server side includes need to be in any special location?

The files containing server side includes, like any other content files that will be served over the web, must reside underneath the document root. In addition, the directory that they reside in must be explicitly set up to allow server side includes, in much the same manner as the directories containing CGI scripts must be set up to allow CGI script execution. This makes sense when you consider that both server side includes and CGI scripts result in code being run on the server in most cases.

The relevant lines from CLAM's configuration are:

	UserDir public_html

	#
	# Control access to UserDir directories.  The following is an example
	# for a site where these directories are restricted to read-only.
	#
	<Directory "/home/*/public_html">
    	    AllowOverride All
    	    Options MultiViews ExecCGI Indexes FollowSymLinks Includes
    	    CheckSpelling on
    	    Order allow,deny
    	    Allow from all
	</Directory>
Reading this, you'll notice that your personal home page area on CLAM should be in a subdirectory public_html off of your main home directory, and that you are allowed to have a default index page (allowable names of which are specified elsewhere with the DirectoryIndex line), run CGI scripts, have symbolic links to other files be considered to be under your document root, and use server side includes.

In most cases? Yes. You can restrict what directives are allowed in your server side include statements, so that you will only allow directives that include static text, and disable any directives that result in code execution. If we were to do this, the relevant lines from CLAM above would become:

        UserDir public_html

        #
        # Control access to UserDir directories.  The following is an example
        # for a site where these directories are restricted to read-only.
        #
        <Directory "/home/*/public_html">
            AllowOverride All
            Options MultiViews ExecCGI Indexes FollowSymLinks IncludesNOEXEC
            CheckSpelling on
            Order allow,deny
            Allow from all
        </Directory>

One of the most popular uses of server side includes is to have them execute a CGI program and insert its output. If you do this, the actual CGI program must reside in a directory that is configured to allow CGI execution. In our case, we can put the file with the server side include and the CGI script the include directives call right in our public_html directories and it will all work fine.

Are there security risks with server side includes?

Presuming that you are allowing the server side includes to execute code (using the Includes option above, rather than the IncludesNOEXEC option), then server side includes present the same risks as CGI files do. Server side include directives, like CGI programs (and remember, one of the most popular server side include directives is one which executes a perl script and prints its output) execute commands on your server as the HTTP daemon user. You should be careful about who you let use server side includes on your server!

Okay, so what exactly do these server side includes look like?

Server side includes look like SGML comments (or HTML comments, which are a special case of SGML comments). This is so that if you move a page intended for server parsing (foo.shtml, for instance) to a server which doesn't support server side includes, the the server side include (which is now not being parsed and replaced) will not mess up the appearance of the page. Inside the "comment begin and end" (<!-- and -->) there is the type of server side include, or element, and one or more attribute=value pairs. The format of a server side include is thus

	<!--#element attribute=value attribute=value -->
The allowable attributes depend on the element. The element is the command that the server side include will do.

What server side include commands are available?

For a complete list, you should go to the references listed below. However, here are some of the most common:

Where can I read more?

What should I do for next week?

Please try to do the following tasks:


Comments? Questions? General harassment? Mail it to mcovingt@staff.uiuc.edu