I sat around and goofed off, since we didn't have an
assignment last week! I spent lots of time in the garden too.
Remember the assignment from week #9?
(If you haven't done it yet, you can
check out the solution here.)
It can be pretty useful to be able to see who's visited your home page last,
but it seems a bit overkill to have your entire home page,
which is probably
99% static content, be a CGI program. If you have a huge
lame home page like, mine, you will waste quite a large chunk of CPU time
printing out each individual statement of HTML with perl - plus, your page
will be a giant print statement, which is really neither elegant nor
easy to edit.
What you really want to do is simply paste a bit of output from a
separately existing CGI program or even a shell command
that generates the small bit of
dynamic content into your page. Occasionally you might also like to
simply include some boilerplate static text from a separate file
into your page, perhaps including a nicely formatted list of contacts or
a single copyright string for your pages. Perhaps you want to simply
include the value of some common system variables, such as the time,
into your page. One way to do all of these things is to use
server side includes, or SSI.
Ordinarily, when you ask the HTTP daemon for a static file (a normal
HTML
document, image, or the like) the daemon simply retrieves the file,
sends the appropriate HTTP headers to your browser, and then
dumps the content of the file to your browser. The daemon doesn't
bother to look inside the file or parse its contents.
You can, however, set up the server (using the httpd.conf file,
in the case of apache) so that the daemon will parse the files
follow specific directives contained inside, and include the
output of those directives in place of the directives themselves
in the final output page. These directives are known as server side
includes - because the server
goes off and does something, and then includes the results. The
end result is much like having only part of the page be a CGI program.
Note that you can't print these server side include directives in
CGI output (or they won't work, anyway). The daemon won't parse
your file TWICE!
In order for the server daemon to know which files to parse, you need to
give them a special suffix. This suffix may be anything you
like, but you must set it up in the server configuration files.
On the clam server, the suffix has been set to .shtml (this
is actually a popular choice - another popular one is .phtml).
To set this up in apache, you'll need to edit the httpd.conf
file to:
Of course, if you like, you could set the special suffix to .html.
This would result in all files ending in .html being parsed.
While there is nothing technically wrong with this, it will lead
to a lot of unnecessary server load, as the vast majority of files ending
in .html probably do not actually contain any server side includes.
The files containing server side includes, like any other content files
that will be served over the web, must reside underneath the document
root. In addition, the directory that they reside in must
be explicitly set up to allow server side includes, in much the
same manner as the directories containing CGI scripts must be set up to
allow CGI script execution. This makes sense when you consider that both
server side includes and CGI scripts result in code being run on the server
in most cases.
The relevant lines from CLAM's configuration are:
In most cases? Yes. You can restrict what directives are allowed in
your server side include statements, so that you will only allow
directives that include static text, and disable any directives that
result in code execution. If we were to do this, the relevant lines
from CLAM above would become:
One of the most popular uses of server side includes is to have
them execute a CGI program and insert its output. If you do this,
the actual CGI program must reside in a directory that is configured
to allow CGI execution. In our case, we can put the file with the
server side include and the CGI script the include directives call right
in our public_html directories and it will all work fine.
Presuming that you are allowing the server side includes to execute
code (using the Includes option above, rather than the
IncludesNOEXEC option), then server side includes present
the same risks as CGI files do. Server side include directives, like
CGI programs (and remember, one of the most popular server side include
directives is one which executes a perl script and prints its output)
execute commands on your server as the HTTP daemon user. You should
be careful about who you let use server side includes on your server!
Server side includes look like SGML comments (or HTML comments,
which are a special case of SGML comments). This is so that if you
move a page intended for server parsing (foo.shtml, for instance)
to a server which doesn't support server side includes, the
the server side include (which is now not being parsed and replaced)
will not mess up the appearance of the page. Inside the
"comment begin and end" (<!-- and -->) there is the
type of server side include, or element, and one or more
attribute=value pairs. The format of a server side include
is thus
For a complete list, you should go to the references listed below.
However, here are some of the most common:
Attributes are:
Example: This file was last modified on
<!--#echo var="LAST_MODIFIED"-->
This can of course include strings like "rm -f *" too, so you
should be careful! If Options NoExec is set, or the directory
is configured with IncludeNOEXEC (as opposed to plain Includes),
this command is disabled.
Attributes are:
Please try to do the following tasks:
How did you do last week's
assignment?
Does every page with dynamic
content have to be a CGI program? Sometimes the dynamic part of the page
is just the last little piece.
What is a server side
include?
How does the daemon know
which files it has to parse and which ones it can simply return
blindly?
The relevant lines from the httpd.conf file on clam are:
This tells the server what Content-type: header to use with
the .shtml files.
This tells the server that how it should
handle these special .shtml files -
namely by parsing them.
# To use server-parsed HTML files
AddType text/html .shtml
AddHandler server-parsed .shtml
You might also want to allow a .shtml file to serve as the default
index file (the file that comes up when a URL ends in a directory).
To do so, simply add index.shtml (or any other favorite file name)
to the list of allowable DirectoryIndex files, as follows:
DirectoryIndex index.html index.shtml index.cgi
Do the files containing
server side includes need to be in any special location?
UserDir public_html
#
# Control access to UserDir directories. The following is an example
# for a site where these directories are restricted to read-only.
#
<Directory "/home/*/public_html">
AllowOverride All
Options MultiViews ExecCGI Indexes FollowSymLinks Includes
CheckSpelling on
Order allow,deny
Allow from all
</Directory>
Reading this, you'll notice that your personal home page area on CLAM
should be in a subdirectory public_html off of your main home
directory, and that you are allowed to have a default index page
(allowable names of which are specified elsewhere with the
DirectoryIndex line), run CGI scripts, have symbolic links
to other files be considered to be under your document root, and
use server side includes.
UserDir public_html
#
# Control access to UserDir directories. The following is an example
# for a site where these directories are restricted to read-only.
#
<Directory "/home/*/public_html">
AllowOverride All
Options MultiViews ExecCGI Indexes FollowSymLinks IncludesNOEXEC
CheckSpelling on
Order allow,deny
Allow from all
</Directory>
Are there security risks
with server side includes?
Okay, so what exactly do these
server side includes look like?
<!--#element attribute=value attribute=value -->
The allowable attributes depend on the element. The element is the
command that the server side include will do.
What server side include
commands are available?
Used to print one of the include variables available to server
side includes. If the variable has no value, it will print as
"(none)."
The name of the variable you want to print. These variables may
be printed directly with echo or they may be accessed
by a CGI program called with exec. In the latter case,
the CGI program will find the variables in the %ENV hash.
Available variables are:
Used to include a static file (perhaps containing a boilerplate
copyright notice or menu) inside the current document.
Attributes are:
The name of a file to include, relative to the current document.
It cannot contain ../, nor can it be an absolute path. Generally
you'll want to use the virtual attribute instead.
Example: <!--#include file="legal_notice.html"-->
The encoded URL path to a file to include, on your site, relative
to the document root (represented by a slash "/"). If the path
does not begin with a slash, it is assumed to be relative to the
current document.
Note: If the directory containing the parsed file
(the .shtml file) has IncludesNOEXEC set on it (rather than
simply Includes) then you can't link to a CGI script. Otherwise,
you can link to a CGI script, and it will execute as usual, complete
with any attached query string.
Example: <!--#include file="legal_notice.html"-->
Used to execute a given shell command (with /bin/sh) or CGI script,
and insert the results into the current document. The IncludesNOEXEC
command disables this completely.
Attributes are:
The encoded URL relative path to the CGI script. If the path
does not begin with a slash (indicating the document root) it
is assumed to be relative to the current file. The directory
containing the script must be set up to allow CGI execution.
You cannot append any query string or path info to the end of the
URL for the CGI script - instead, the CGI script will be passed
any query string or path info data appended to the URL for the
current page containing the server side include.
Note: If the page should return a Location: header rather than
content, the server side include will be replaced with a link
to the URL given in the returned Location: header.
A string to be sent to /bin/sh and executed. This can of course
be a call to execute a garden variety perl script complete with
command line arguments. The various include variables (explained
above) will be available to such scripts.
Example: <!--#exec cmd="finger"-->
This command controls some behaviors of the SSI, such as the
formatting
of times and dates and the error message to be printed when a file
fails to parse.
The error message to print to the screen in place of the SSI when
an error occurs during parsing.
Example: <!--#config errmsg="Something barfed!" -->
The format to use when displaying the size of a file. Setting it
to bytes displays sizes in bytes, whereas setting it to
abbrev will use the KB or MB abbreviations as appropriate.
A format string used by the strftime() library routine
when printing dates (such as the last modified date).
Where can I read
more?
This chapter will explain the various settings that you need to
add to the server's httpd.conf file, as well as some information on
the format of the server side include directives themselves. Even
if you aren't the person setting up the server at your site, it can
be helpful to have a passing idea of what is involved with the
server setup.
A pretty good list.
This one has some of the newer, fancier commands, including those specific
to the apache server.
Another one from 1995.
What should I do for next
week?
(Hint: You'll probably just want to have the server call the same
perl script you wrote to do this as a CGI in week 9.)
Comments? Questions? General harassment? Mail it to
mcovingt@staff.uiuc.edu