CGI@CLAM
week #1

Before we get started...

...you should probably read some administrivia dealing with the class.

What are CGI programs, anyway? Where "are they" on the web?

The web browser and web server together make up a client-server application. Each is a separate, stand alone program (or application) which knows how to communicate with the other. The web browser is the client - like clients anywhere else, the web browser asks for something, in this case for a file. The web server is exactly that - a server, which like servers anywhere else serves something, in this case files, to those asking for them. Note that web browsers and web servers are not linked in pairs. A given web server will serve documents to any number of web browsers, and similarly a given web browser will request documents from any number of web servers.

The files being served and requested may be either static, preexisting documents or they may be created on the fly by other programs on the same machine as the web server which are called by the web server at the time the server receives the request. These "on the fly" documents may depend on specific information sent by the web browser sending the request (this information is often information which the browser user has typed into a form), other conditions at the server, or even the time of the request. These files or documents, whether static or created on the fly, may take many forms - HTML text files (the usual "web pages"), image files, sound files, etc. It is up to the web browser to interpret these files and display them to the browser user. Most browsers, by agreement, know how to interpret and display documents written in Hyper Text Markup Language (HTML).

The "other programs" which create a document on the fly to be sent back to the requesting browser are called CGI programs. We will learn to write these programs in this class. You will be able to write HTML pages with forms and also the programs which process the form information and do something with that information.

What does CGI stand for?

CGI stands for Common Gateway Interface. Note that CGI is not a programming language. Rather, CGI is a specification for how information (such as things people type into forms) is passed from the browser to the server. Specifically, it describes the variables in which the server and programs called by the server can find the information passed from the browser.

CGI programs may be written in any language. We will be writing our CGI programs in perl.

What actually happens when I request a page on the web?

When you "go to" a page on the web, either by clicking on a hyperlink, typing in a URL in the "Netsite" or "Location" input line in your browser, or submitting a form, your browser calls up the server and asks it for the page.

Let's use this here CGI@CLAM page as an example in this discussion. Notice that the URL for this page is

	http://www.tcp.com/MINK_TCP_COM/CLAM/
Note that I could be more specific with the URL, and write it as
	http://www.tcp.com/MINK_TCP_COM/CLAM/index.html

It is this URL which tells your browser "whom" to call, and "what" to ask for. We'll talk about this below.

The server is actually an HTTP daemon, or httpd (a "daemon" is a fancy Unix term for a program that runs in the background and responds to calls made to it) running on some computer at the site hosting the page. In this case the daemon is an Apache server running on tcp.com, which is a Sun workstation sitting on a friend's desk in Santa Clara, California.

You can think of the daemon as a little devil wearing a green eyeshade and spectacles, sitting at a desk with a thick ledger on it in front of a huge file cabinet with many drawers. He's waiting for the phone to ring. There are other daemons in the room too, each with a different phone, each hired to do a different task. The httpd daemon is waiting for someone to call him on the phone and ask for a file. He speaks a special language called HTTP (Hyper Text Transfer Protocol) which the caller will need to speak in order to communicate with him. There he sits, inside tcp.com, kinda like the little orchestras that sit inside your radio.

When you click on the link above, your browser gives him a call. The browser decodes the URL as follows:

Now that the browser knows whom to contact and what file to ask for, it goes ahead and makes the call. The browser opens a connection to wocket.csl.uiuc.edu port 80, waits for the daemon to pick up, and has the following conversation:

Browser: GET /MINK_TCP_COM/CLAM/ HTTP/1.1
(Can I please have the file /MINK_TCP_COM/CLAM/ relevant to your document root?)
Connection: Keep-Alive
(Don't hang up the phone until I do.)
User-Agent: Mozilla/4.0 (WinNT; I)
(I'm Netscape 4.0 running in Windows NT.)
Host: windows.box.onthejob.net
(I'm running on a machine called windows.box.onthejob.net.)
Accept: image/gif, image/jpeg, */*
(I can support all these nifty data formats.)
*blank line*
Server daemon: HTTP/1.1 200 OK
(Yep! I found that just fine, code 200, sir!)
Date: Wed, 22 Sep 1999 0:30:04 GMT
(It's currently 12:30 or so AM on Wednesday, 9/22/99 over in England.)
Server: Netscape Enterprise/3.5.1
(I'm a Netscape Enterprise server version 3.5.1.)
Content-type: text/html
(This is an HTML file, get ready to display HTML.)
Content-length: 12246
(This file is 12246 characters long.)
Last-modified: Tue, 21 Sep 1999 13:59:24 GMT
(See how old this file is?)
*blank line*
*source of this HTML page*
For more information on HTTP, you should take a look at Chapter 3 of "Web Client Programming with Perl" by Clinton Wong, published by O'Reilly and Associates.

Notice that both parties indicate "I'm done talking, now you go ahead" with a blank line. This will become important, because the CGI programs that we will write will need to supply some of the server daemon's end of the conversation in the form of HTTP headers at the beginning of their output. These headers will need to be separated from the rest of the program's output by a blank line.

Impersonate a browser yourself on the command line!

The http daemon doesn't much care who he's talking to, as long as the person on the other end of the line speaks valid HTTP. In fact, you can dial up the daemon yourself (why does that sound like the lyrics of a bad rock song?) using telnet, which is a program that allows you to type things directly to a program listening on a port at a given machine.

As it turns out, wocket.csl.uiuc.edu has telnet disabled for security reasons (scam artists can't call you on the phone selling bogus insurance if the phone is unplugged, after all) but we use telnet to ask the server on che.onthejob.net to return my home page on che. The URL for my home page on che is

	http://che.onthejob.net/MINK_TCP_COM/
To retrieve this page, type the following:

You should receive something like the following:

HTTP/1.1 200 OK
Date: Tue, 21 Sep 1999 01:29:11 GMT
Server: Apache/1.3.6 (Unix)
Last-Modified: Fri, 03 Sep 1999 20:18:37 GMT
ETag: "a0a0b-9da-37d02d1d"
Accept-Ranges: bytes
Content-Length: 2522
Content-Type: text/html

<HTML>
<HEAD>
<TITLE>cmi.picnic</TITLE>
<BODY BGCOLOR=#c0c0c0 TEXT=MidnightBlue LINK=#328080 VLINK=MidnightBlue>
<center>
<IMG src="http://che.onthejob.net/MINK_TCP_COM/Images/minks_page_tran.gif 
     border=0 alt="mink's page logo">
</center>
<P>
<font size=+2>Maybe you want to <b>see some stuff?</b></font>
<UL>
<LI><A HREF="http://wocket.csl.uiuc.edu/~maiko/">
    <b>Mink's Lame Home Page</b></A>
    <blockquote>
    This is my <b>main home page</b> which resides <b>at work</b>.
    It's <b>rather lame</b> as the name says but it does have a
    lot of (unnecessary!) <b>information</b> and I <b>update it
    every day</b>.
    </blockquote>
<LI><A HREF="ideal_community.html"><b>Thoughts on an Ideal Community</b></A>
    <blockquote>
    This is <b>what I want to turn the world into</b>.
    The actual text was written up while in a <b>design group</b> at the
    <A HREF="http://www.globalideasbank.org/wbi/WBI-190.HTML">
    <b>School for Designing a Society</b></A>.
    </blockquote>
<LI><A HREF="learn_work.html"><b>Thoughts on Learning and Working</b></A>
    <blockquote>
    This is mainly an expanded excerpt from the above general thoughts.
    Things in here are why I like being able to 
    <b>teach at <A HREF="http://www.onthejob.net/training.html">CLAM</A></b>.    </blockquote>
<LI><A HREF="rsvp.cgi"><b>cmi.picnic page</b></A>
    <blockquote>
    This is a local <b>copy</b> of the real cmi.picnic page (which
    resides on shout.net) so you can feel free to <b>sign bogus people
    up for the picnic</b> and otherwise <b>play around with it</b>.
    I intend to morph this into a <b>general purpose potluck page</b>.
    <P>
    <b>CLAMComrades</b> in particular might be interested in <b>viewing
    the code</b> for the two programs that make up the picnic page,
    <A HREF="rsvp.txt"><b>rsvp.cgi</b></A> and 
    <A HREF="edit_rsvp.txt"><b>edit_rsvp.cgi</b></A>. 
    </blockquote>
<LI><A HREF="leaves.html"><b>happy leaf page!</b></A> 
    <blockquote>
    The <b>yellow and orange on blue</b> of sun-backlit <b>autumn leaves</b>
    against the crisp blue <b>akibare sky</b> is my absolute favorite thing 
    in the world. Sometimes I try to <b>make some art</b> to get even just
    a little bit of that feel.
    <P>
    Now you might <b>understand a bit more</b> about my <b>kitchen</b>.
    </blockquote>
</UL>
<hr>
<address>
Comments? Questions? General harassment? Mail it to
<A HREF="mailto:mcovingt@staff.uiuc.edu"><b>mcovingt@staff.uiuc.edu</b></A>
</address>
</BODY>
</HTML>
	
Sure enough, that looks like the source of my home page on che plus some extra HTTP headers at the top. Pretty neat, huh?

Notice that when your browser gets this output, it knows how to interpret the HTML and draw the page, complete with bold fonts, list bullets, and that sort of thing. Different browsers will do this slightly differently, which is why the same web page looks a bit different on a Linux box that it does on a Mac - both machines know how to draw a checkbox, but they do it slightly differently. Some browsers also will accept tags other than the standard ones, which is how people are able to write pages that display just fine on one browser and absolutely croak on another. Note too that when the browser sees the <IMG> tag, it knows it needs to immediately call the daemon back to request the appropriate image file. It will do this without you needing to reclick (unless, of course, you have images turned off).

So where do CGI scripts come in?

You can place executable programs either underneath the document root or in a separate programs directory. The http daemon knows where and how to find these, either by location (if they are in a special programs directory) or by name (generally if programs are located under the document root, they will have the ".cgi" suffix).

When a client requests one of these program pages, the server doesn't return the file itself, as it would with a regular HTML page. Rather, it executes the program, and returns the output of that program. Specifically, the daemon will run the program, look at its output, add some extra HTTP to the top (such as the HTTP/1.1 200 OK line indicating successful execution) in addition to those HTTP headers the program itself must provide, and send the whole thing back to the client.

Generally CGI programs will output HTML source. However, they may output binary image information, sound information, or any other format that the browser can handle.

Do the files need special permissions?

The server daemon is a process (an instance of a program - if two people were to run the same program simultaneously, each executing copy of it would be a unique process) and as such is run under a user id, like all other programs. This user id is generally known as the web user or the httpd user. On most systems, the http daemon runs as the user nobody. User nobody is a real user, listed in the /etc/passwd file with all the other users, but without login capability and with reduced privileges. The reduced privileges are for security reasons - as the httpd user might be running CGI programs written by anyone with an account on the server computer, we don't want him to be able to delete or alter important system files.

To find out what user id the http daemon is running as, type

	ps -ef | grep httpd
You should see something like this (on che):
root       458     1  0 Sep16 ?        00:00:02 /www/bin/httpd
nobody   23549   458  0 14:06 ?        00:00:00 /www/bin/httpd
nobody   23690   458  0 14:43 ?        00:00:00 /www/bin/httpd
nobody   23858   458  0 15:29 ?        00:00:00 /www/bin/httpd
nobody   23859   458  0 15:29 ?        00:00:00 /www/bin/httpd
nobody   23860   458  0 15:29 ?        00:00:00 /www/bin/httpd
nobody   23907   458  0 15:44 ?        00:00:00 /www/bin/httpd
nobody   25005   458  0 21:02 ?        00:00:00 /www/bin/httpd
nobody   25006   458  0 21:02 ?        00:00:00 /www/bin/httpd
nobody   25007   458  0 21:02 ?        00:00:00 /www/bin/httpd
You can see that the http daemon runs as user nobody on che. Notice that there is one instance that runs as root - that one only exists to spawn the others. There are several httpd daemons sitting at that little desk, as it can get busy in there!

You and the user id running the server daemon are generally not in the same group. This means that all pages to be served on the web by the http daemon must be readable by all, all directories leading down to those pages must be executable by all, and finally all CGI programs to be run by the daemon must be both readable and executable by all.

In short, you need the following permissions (in octal):

If the permissions are wrong on a page or program you will get either a "File not Found" page or a server error, depending.

We'll write these in perl?

Indeedly doodly! That means that you should start reviewing perl now. You should have no problem creating, reading from, and assigning to all of the basic data structures (scalars, lists, and hashes), and be familiar with looping structures (including foreach), reading from and writing to files, split and join, and regular expressions. I suggest you spend some quality time with the Perl book this week.

You should also have a command of basic HTML.


Comments? Questions? General harassment? Mail it to mcovingt@staff.uiuc.edu