...you should probably read some administrivia dealing with the class.
The web browser and web server together make up a
client-server application.
Each is a separate, stand alone program (or application) which knows how
to communicate with the other.
The web browser is the client - like clients anywhere else,
the web browser asks for something, in this case for a file.
The web server is exactly that - a
server, which like servers anywhere else serves something, in this case
files, to those asking for them. Note that web browsers and web servers
are not linked in pairs. A given web server will serve documents to any
number of web browsers, and similarly a given web browser will request
documents from any number of web servers.
The files being served and requested may be either static,
preexisting documents or they may be created on the fly by
other programs on the same machine as the web server which are
called by the web server at the time the server receives
the request. These "on the fly" documents may depend on specific information
sent by the web browser sending the request (this information is often
information which the browser user has typed into a form), other conditions
at the server,
or even the time of the request. These files or documents, whether static
or created on the fly, may take many forms - HTML text files (the usual
"web pages"), image files, sound files, etc. It is up to the web browser
to interpret these files and display them to the browser user. Most
browsers, by agreement, know how to interpret and display documents
written in Hyper Text Markup Language (HTML).
The "other programs" which create a document on the fly to be sent back
to the requesting browser are called CGI programs. We will
learn to write these programs in this class. You will be able to write
HTML pages with forms and also the programs which process the form
information and do something with that information.
CGI stands for Common Gateway Interface. Note that CGI
is not a programming language. Rather, CGI is a specification for
how information (such as things people type into forms) is passed
from the browser to the server.
Specifically, it describes the variables in which the server and
programs called by the server can find the information passed from the
browser.
CGI programs may be written in any language. We will be writing our
CGI programs in perl.
When you "go to" a page on the web, either by clicking on a hyperlink,
typing in a URL in the "Netsite" or "Location" input line in your browser,
or submitting a form, your browser calls up the server and
asks it for the page.
Let's use
this here CGI@CLAM page as an example in this discussion.
Notice that the URL for this page is
It is this URL which tells your browser "whom" to call, and "what" to
ask for. We'll talk about this below.
The server is actually an HTTP daemon, or httpd
(a "daemon" is a fancy
Unix term for a program that runs in the background and responds to
calls made to it) running on some computer at the site hosting the page.
In this case the daemon is an Apache server
running on tcp.com, which is a Sun workstation sitting on a friend's desk
in Santa Clara, California.
You can think of the daemon as a little devil wearing a green eyeshade
and spectacles, sitting at a desk with a thick ledger on it in front of a
huge file cabinet with many drawers. He's waiting for the phone to ring.
There are other daemons in the room too, each with a different phone,
each hired to do a different task.
The httpd daemon is waiting for someone to call him on the phone
and ask
for a file. He speaks a special language called HTTP (Hyper Text
Transfer Protocol) which the caller will need to speak in order to
communicate with him. There he sits, inside tcp.com, kinda like the little
orchestras that sit inside your radio.
When you click on the link above, your browser gives him a call. The
browser decodes the URL as follows:
To find out the IP address of wocket.csl.uiuc.edu yourself, you can
type in
Note that the IP address and the domain name are
interchangeable.
I could have written the URL for this page as
You've probably not seen many URLs with a port specification.
That's because there is a default port that http daemons like to use.
Non-secure daemons, like this one, default to using port 80.
Secure daemons, such as the ones listening for your credit
card number and talking in code (still speaking HTTP, only encrypting
it) default to using port 443.
Notice that I orginally left the port specification off of the URL.
I can do that, and most people usually do, because the browser will
use the default port if none is specified. The browser can tell if you
are trying to call up a secure http daemon because if you are, the
URL would begin with https://. That is in fact how you tell your
browser to speak encrypted HTTP.
The person who sets up the server daemon tells it what port it will
listen to.
Now that the browser knows whom to contact and what file to ask for,
it goes ahead and makes the call. The browser opens a connection to
wocket.csl.uiuc.edu port 80, waits for the daemon to pick up, and
has the following conversation:
Notice that both parties indicate "I'm done talking, now you go ahead"
with a blank line. This will become important, because the
CGI programs that we will write will need to supply some of the
server daemon's end of the conversation in the form of
HTTP headers at the beginning of their output. These headers will
need to be separated from the rest of the program's output by a blank
line.
The http daemon doesn't much care who he's talking to, as long as the
person on the other end of the line speaks valid HTTP. In fact, you can
dial up the daemon yourself (why does that sound like the lyrics
of a bad rock song?) using telnet, which is a program that allows
you to type things directly to a program listening on a port at a given
machine.
As it turns out, wocket.csl.uiuc.edu has telnet disabled
for security reasons (scam artists can't call you on the phone selling
bogus insurance if the phone is unplugged, after all) but we use telnet
to ask the server on che.onthejob.net to return
my home page on che.
The URL for my home page on che is
You should receive something like the following:
Notice that when your browser gets this output, it knows how to
interpret the HTML and draw the page, complete with
bold fonts, list bullets, and that sort of thing. Different browsers
will do this slightly differently, which is why the same web page looks
a bit different on a Linux box that it does on a Mac - both machines
know how to draw a checkbox, but they do it slightly differently. Some
browsers also will accept tags other than the standard ones, which is
how people are able to write pages that display just fine on one browser
and absolutely croak on another. Note too that when the browser sees
the <IMG> tag, it knows it needs to immediately call the
daemon back to request the appropriate image file. It will do
this without you needing to reclick (unless, of course, you have images
turned off).
You can place executable programs either underneath the document
root or in a separate programs directory. The http daemon
knows where and how to find these, either by location (if they are in
a special programs directory) or by name (generally if programs are located
under the document root, they will have the ".cgi" suffix).
When a client requests one of these program pages, the server doesn't
return the file itself, as it would with a regular HTML page. Rather,
it executes the program, and returns the output of that
program. Specifically, the daemon will run the program, look at
its output, add some extra HTTP to the top (such as the
Generally CGI programs will output HTML source. However, they
may output binary image information, sound information, or any other
format that the browser can handle.
The server daemon is a process (an instance of a program - if two
people were to run the same program simultaneously, each executing copy of it
would be a unique process) and as such is run under a user id, like
all other programs. This user id is generally known as the web user
or the httpd user. On most systems, the http daemon runs as the
user nobody. User nobody is a real user, listed in the /etc/passwd
file with all the other users, but without login capability and with reduced
privileges. The reduced privileges are for security reasons - as the
httpd user might be running CGI programs written by anyone with an account on
the server computer, we don't want him to be able to delete or alter important
system files.
To find out what user id the http daemon is running as, type
You and the user id running the server daemon are generally
not in the same group. This means that all pages to be served
on the web by the http daemon must be readable by all, all
directories leading down to those pages must be executable
by all, and finally all CGI programs to be run by the daemon
must be both readable and executable by all.
In short, you need the following permissions (in octal):
You should also have a command of basic HTML.
What are CGI programs, anyway?
Where "are they" on the web?
What does CGI stand for?
What actually happens when I
request a page on the web?
http://www.tcp.com/MINK_TCP_COM/CLAM/
Note that I could be more specific with the URL, and write it as
http://www.tcp.com/MINK_TCP_COM/CLAM/index.html
This tells the browser that it should use HTTP to talk to the
daemon at the other end. This is the correct language to speak to the
http daemon. We'll talk about what is said a bit later on.
(Incidentally, you may have seen URLs which begin with
ftp://, particularly when downloading software. Those URLs
instruct the browser to speak FTP or File Transfer Protocol.
The browser would need to speak with the ftp daemon in that case.)
This is the domain name of the server (note that "server" is
also used to mean the computer that the server daemon is running on -
yes, this is imprecise). Your browser needs to
know the IP address of the server in order to make contact.
Neither browser nor server care that the physical location of the
machine is in Urbana. Rather, the IP address is a unique sequence of
numbers specifying where the machine is on the internet.
nslookup wocket.csl.uiuc.edu
at the Unix prompt. You should receive a reply something like
Non-authoritative answer:
Name: wocket.csl.uiuc.edu
Address: 130.126.136.244
The 130.126 indicates that the machine is at UIUC. The rest of it
just gets more specific.
http://130.126.136.244/~maiko/CLAM/
Occasionally you might need to do this if the DNS
(domain name service) that your computer uses is unresponsive or
down.
This specifies the port that the daemon is running on. You
might think of it as the particular extension that the
daemon's phone is on.
This is the file we are asking for. The browser will ask for
it as is, but the daemon will then analyze it in several parts:
Filenames all start with /, and are relevant to the
document root, or the root of the tree of files which the
daemon is allowed to serve over the web.
~maiko/ is an alias for the directory
/home/maiko/public_html/ on wocket. The person who sets up the
server daemon can specify an alias of this sort for users' own
personal document roots. What this means is, each user is allowed to
own a directory tree of files which the daemon may serve over the
web. The person setting up the server specifies what the name of
that directory should be (it is the same for each user, and the most
common choice is "public_html" off of the user's home directory)
and what the alias will be (the default is "~username"). Note
that any files that the daemon will serve from this area must be
made readable by him.
This is simply a subdirectory I have inside /home/maiko/public_html.
The contents of /home/maiko/public_html, including subdirectories,
are completely up to me.
This is the name of the HTML file I want the daemon to serve from
/home/maiko/public_html/CLAM/. That's this file you're reading right
now. Feel free to view the source if you like. Notice that I didn't
have to specify the name of the file, I was able to end the URL
with simply CLAM/ if I like. The reason is that the daemon is set
up to serve a file named "index.html" in the directory he's given,
if he's only given a directory instead of a complete filename.
This too is up to the person who sets up the http daemon. Many
places don't have a default filename at all. Other places set it
to "home.html" or "main.html." Most places allow a default filename
and set it to "index.html."
For more information on HTTP, you should take a look at
Chapter 3 of "Web Client Programming with Perl" by
Clinton Wong, published by
O'Reilly and Associates.
Browser:
GET /MINK_TCP_COM/CLAM/ HTTP/1.1
(Can I please have the file /MINK_TCP_COM/CLAM/ relevant to your document root?)
Connection: Keep-Alive
(Don't hang up the phone until I do.)
User-Agent: Mozilla/4.0 (WinNT; I)
(I'm Netscape 4.0 running in Windows NT.)
Host: windows.box.onthejob.net
(I'm running on a machine called windows.box.onthejob.net.)
Accept: image/gif, image/jpeg, */*
(I can support all these nifty data formats.)
*blank line*
Server daemon:
HTTP/1.1 200 OK
(Yep! I found that just fine, code 200, sir!)
Date: Wed, 22 Sep 1999 0:30:04 GMT
(It's currently 12:30 or so AM on Wednesday, 9/22/99 over in England.)
Server: Netscape Enterprise/3.5.1
(I'm a Netscape Enterprise server version 3.5.1.)
Content-type: text/html
(This is an HTML file, get ready to display HTML.)
Content-length: 12246
(This file is 12246 characters long.)
Last-modified: Tue, 21 Sep 1999 13:59:24 GMT
(See how old this file is?)
*blank line*
*source of this HTML page*
Impersonate a browser yourself
on the command line!
http://che.onthejob.net/MINK_TCP_COM/
To retrieve this page, type the following:
telnet che.onthejob.net 80
Connect to che, at port 80 (the port the httpd is listening to). You
won't see a response, as the http daemon waits for you to talk first.
GET /MINK_TCP_COM/ HTTP/1.1
Request the page. You can send other information via HTTP just as
the browser would, but you don't absolutely have to. This will do
for now.
Host: whitehouse.gov
Okay, give it a host. The daemon on che is picky about that, seems
like! Notice you can pretend to be wherever you like.
Remember, you need to send a blank line for the daemon to know you're
finished talking!
HTTP/1.1 200 OK
Date: Tue, 21 Sep 1999 01:29:11 GMT
Server: Apache/1.3.6 (Unix)
Last-Modified: Fri, 03 Sep 1999 20:18:37 GMT
ETag: "a0a0b-9da-37d02d1d"
Accept-Ranges: bytes
Content-Length: 2522
Content-Type: text/html
<HTML>
<HEAD>
<TITLE>cmi.picnic</TITLE>
<BODY BGCOLOR=#c0c0c0 TEXT=MidnightBlue LINK=#328080 VLINK=MidnightBlue>
<center>
<IMG src="http://che.onthejob.net/MINK_TCP_COM/Images/minks_page_tran.gif
border=0 alt="mink's page logo">
</center>
<P>
<font size=+2>Maybe you want to <b>see some stuff?</b></font>
<UL>
<LI><A HREF="http://wocket.csl.uiuc.edu/~maiko/">
<b>Mink's Lame Home Page</b></A>
<blockquote>
This is my <b>main home page</b> which resides <b>at work</b>.
It's <b>rather lame</b> as the name says but it does have a
lot of (unnecessary!) <b>information</b> and I <b>update it
every day</b>.
</blockquote>
<LI><A HREF="ideal_community.html"><b>Thoughts on an Ideal Community</b></A>
<blockquote>
This is <b>what I want to turn the world into</b>.
The actual text was written up while in a <b>design group</b> at the
<A HREF="http://www.globalideasbank.org/wbi/WBI-190.HTML">
<b>School for Designing a Society</b></A>.
</blockquote>
<LI><A HREF="learn_work.html"><b>Thoughts on Learning and Working</b></A>
<blockquote>
This is mainly an expanded excerpt from the above general thoughts.
Things in here are why I like being able to
<b>teach at <A HREF="http://www.onthejob.net/training.html">CLAM</A></b>. </blockquote>
<LI><A HREF="rsvp.cgi"><b>cmi.picnic page</b></A>
<blockquote>
This is a local <b>copy</b> of the real cmi.picnic page (which
resides on shout.net) so you can feel free to <b>sign bogus people
up for the picnic</b> and otherwise <b>play around with it</b>.
I intend to morph this into a <b>general purpose potluck page</b>.
<P>
<b>CLAMComrades</b> in particular might be interested in <b>viewing
the code</b> for the two programs that make up the picnic page,
<A HREF="rsvp.txt"><b>rsvp.cgi</b></A> and
<A HREF="edit_rsvp.txt"><b>edit_rsvp.cgi</b></A>.
</blockquote>
<LI><A HREF="leaves.html"><b>happy leaf page!</b></A>
<blockquote>
The <b>yellow and orange on blue</b> of sun-backlit <b>autumn leaves</b>
against the crisp blue <b>akibare sky</b> is my absolute favorite thing
in the world. Sometimes I try to <b>make some art</b> to get even just
a little bit of that feel.
<P>
Now you might <b>understand a bit more</b> about my <b>kitchen</b>.
</blockquote>
</UL>
<hr>
<address>
Comments? Questions? General harassment? Mail it to
<A HREF="mailto:mcovingt@staff.uiuc.edu"><b>mcovingt@staff.uiuc.edu</b></A>
</address>
</BODY>
</HTML>
Sure enough, that looks like the source of my home page on che plus
some extra HTTP headers at the top. Pretty neat, huh?
So where do CGI scripts come
in?
HTTP/1.1 200 OK line indicating successful execution) in
addition to those HTTP headers the program itself must provide, and
send the whole thing back to the client.
Do the files
need special permissions?
ps -ef | grep httpd
You should see something like this (on che):
root 458 1 0 Sep16 ? 00:00:02 /www/bin/httpd
nobody 23549 458 0 14:06 ? 00:00:00 /www/bin/httpd
nobody 23690 458 0 14:43 ? 00:00:00 /www/bin/httpd
nobody 23858 458 0 15:29 ? 00:00:00 /www/bin/httpd
nobody 23859 458 0 15:29 ? 00:00:00 /www/bin/httpd
nobody 23860 458 0 15:29 ? 00:00:00 /www/bin/httpd
nobody 23907 458 0 15:44 ? 00:00:00 /www/bin/httpd
nobody 25005 458 0 21:02 ? 00:00:00 /www/bin/httpd
nobody 25006 458 0 21:02 ? 00:00:00 /www/bin/httpd
nobody 25007 458 0 21:02 ? 00:00:00 /www/bin/httpd
You can see that the http daemon runs as user nobody on che. Notice
that there is one instance that runs as root - that one only exists
to spawn the others. There are several httpd daemons sitting at that
little desk, as it can get busy in there!
If the permissions are wrong on a page or program you will get either
a "File not Found" page or a server error, depending.
We'll write these in perl?
Indeedly doodly! That means that you should start reviewing perl now.
You should have no problem creating, reading from, and assigning to
all of the basic data structures
(scalars, lists, and hashes), and be familiar
with looping structures (including foreach), reading from and writing
to files, split and join, and regular expressions.
I suggest you spend some quality time with the Perl book this week.
Comments? Questions? General harassment? Mail it to
mcovingt@staff.uiuc.edu