Note. In writing these notes, I have, probably unconsciously, assumed a familiarity with the Unix environment and the C programming language. Please note that C is not the same as C++, it is a much simpler non-object oriented programming language.
The normal operation of a WWW server when processing a URL is to simply deliver the indicated file to the remote browser. If the server is suitably configured, however, it will under certain circumstances, execute the file as a program and deliver, to the remote browser, whatever the program writes to its standard output. The program may do whatever its author wished, such as interrogating a database and constructing HTML based on the information found in the database.
On the clun.scit.wlv.ac.uk WWW server files will be executed under the following circumstances
/usr/local/ftp/httpd/cgi-bin
.cgi
You are encouraged to examine the files in the directories
/usr/local/ftp/httpd/cgi-bin
and
/usr/local/ftp/httpd/cgi-src
on clun.scit.wlv.ac.uk
for examples of how to write such programs in C.
Any program that is going to write output to be sent to a remote HTML interpreter or browser must ensure that the first line of output is
Content-type: text/htmlor
Content-type: text/plainwith obvious meanings. It is important that the first line must be exactly as shown above, including spaces and case. It is equally important that this first line must be followed by a blank line.
Failure to observe these requirements will result in various obscure error messages.
Note. What is actually happening here is that your program is communicating with the remote browser using the Hypertext Transfer Protocol (HTTP).i [See RFC 2068 for details.] The Content-type: line indicates to the browser the mechanism it should use to process the rest of the information, there are many other possibilities.
If you are using the clun.scit.wlv.ac.uk WWW server, your executed program is a Unix process and there are a number of environment variables available to the process. These are set by the WWW server before it starts your process.
Here is an example program.
#!/bin/sh echo Content-type: text/html echo echo "<html><head><title>Hello World</title>" echo "</head><body><h1>Hello World</h1></body></html>"You can get a copy of the program and save it in your
public_html
directory. Make sure you have set the access rights correctly using
chmod go+rx hw.cgiYou can make the WWW server execute the program by pointing your WWW browser to
http://www.scit.wlv.ac.uk/~UID/hw.cgi
where UID
is, of course, replaced by your login code.
Here is another example that you can get a copy of. This is another shell script that shows the values of various shell environment variables set by the WWW server before starting the program.
#!/bin/sh echo Content-type: text/plain echo echo CGI/1.0 test script report: echo echo argc is $#. argv is "$*". echo echo SERVER_SOFTWARE = $SERVER_SOFTWARE echo SERVER_NAME = $SERVER_NAME echo GATEWAY_INTERFACE = $GATEWAY_INTERFACE echo SERVER_PROTOCOL = $SERVER_PROTOCOL echo SERVER_PORT = $SERVER_PORT echo REQUEST_METHOD = $REQUEST_METHOD echo HTTP_ACCEPT = "$HTTP_ACCEPT" echo PATH_INFO = "$PATH_INFO" echo PATH_TRANSLATED = "$PATH_TRANSLATED" echo SCRIPT_NAME = "$SCRIPT_NAME" echo QUERY_STRING = "$QUERY_STRING" echo REMOTE_HOST = $REMOTE_HOST echo REMOTE_ADDR = $REMOTE_ADDR echo REMOTE_USER = $REMOTE_USER echo AUTH_TYPE = $AUTH_TYPE echo CONTENT_TYPE = $CONTENT_TYPE echo CONTENT_LENGTH = $CONTENT_LENGTH
If you write a C program and then compile it into an executable file, you can, of course, access these environment variables using the getenv() library function.
Finally here's a C program that you can copy.
main() { printf("Content-type: text/html\n\n"); printf("<html><head><title>"); printf("Hello World Again"); printf("</title></head><body>"); printf("<h1>I'm a C Program</h1>\n"); printf("</body></ht\n"); }Compile the program using the command
gcc -o hwcgi.cgi hwcgi.cand make it world executable using the command
chmod go+rx hwcgi.cgi
The program can then be executed in the way shown above.
You can write a CGI back end program in any language that is capable of determining the values of the Unix environment variables and, possibly, accessing the Unix command line arguments. The back end program will, of course, have to be developed to function on the server machine.
The commonest use of the CGI mechanism is to provide a back end for HTML "forms". Before studying this further you need to be familiar with the relevant parts of HTML.
Basically a form is a series of elements enclosed within the tags <form> and </form>. Once the various boxes within the form have been filled in and the "submit" button has been hit the user entered values are transmitted to the server where a CGI back end can process the information and generate whatever reply is required.
A <form> tag has two significant attributes, METHOD and ACTION.
For simple use the METHOD attribute can be set to either GET or POST. This attribute controls the method by which the user entered data is communicated to the back end program.
Many authors suggest that the PUT method is preferable but the only reasons for selecting it in preference to GET are
The ACTION specifies what is to be done when the form-filling is complete. Its value will be simply a URL, if this is the name of an executable file then the file will be executed, however it may simply refer to a WWW page, however this is unusual.
Here is an example<form METHOD="GET" ACTION="http://www.scit.wlv.ac.uk/cgi-bin/test">
The information from a WWW browser that has processed a form is
partially encoded using characters such as + to represent spaces.
There are some specimen programs called get-query.c
and post-query.c
in the directory
/usr/local/ftp/httpd/cgi-src
on clun.scit.wlv.ac.uk
that demonstrate the decoding of forms information.
The utility routines associated with these programs will be
found in util.c
in the same directory.
Take a look at these and use them as a model for your
own forms handling back-ends.
Here are some examples of form input using the select, input and textarea tags. The various attributes and their effects are discussed.
Imagemaps are a widely used and useful feature of the WWW. They operate by the user clicking on a particular position on the map, the co-ordinates of this position are determined by the browser. There are three possible mechanisms that can then be used.
To use a map the HTML should look like this
<a href="map.cgi" ><img src="map.gif" ismap> </a>
It is, of course, the repsonsibility of the executed program to process the data. The commonest usage is to use the HTTP Location: response to tell the browser to go somewhere else.
Here's a complete example using the CGI generalisation. Before studying the code, follow the link to see what it does. First the HTML.
<html><head><title>Map Test</title> </head> <body> <h1>Map Test</h1> <p>Please click on the image</p> <a href="http://www.scit.wlv.ac.uk/~jphb/scarf/map1.cgi"> <img src="http://www.scit.wlv.ac.uk/~jphb/scarf/mapx.gif" ismap> </a> </body> </html>
and here's the C code for the program map1.c that was compiled to give the program map1.cgi
#include <stdio.h> main(int argc, char *argv[]) { int x,y; sscanf(argv[1],"%d,%d",&x,&y); printf("Location: "); if(x >= 130 && x<= 381 && y>= 130 && y<= 260) { if( x>=161 && x<=221 && y>=158 && y<=228) { printf("Red.html\n"); } else if( x>=290 && x<=350 && y>=158 && y<=228) { printf("Blue.html\n"); } else printf("Green.html\n"); } else printf("White.html\n"); printf("\n\n"); }
You can get your own copy of the program map1.c. The coordinates are, of course, related to the image are the programmer will have to use a utility such as xv to examine the image and determine the co-ordinates.
When your program is run via the CGI mechanism it has, of course, been started up by the WWW server. This, normally, means that the process in question is owned by a special user called nobody. You cannot login as nobody and there is no home directory for this user.
The implication is that all files that you want your program to access should be accessible to the user nobody, this means, in practice, that they should be world accessible. For data files that are read by your program this is unlikely to be a problem as the data is, presumably, publically available anyway. However if you intend to log all queries or otherwise record information gathered via the WWW, it means that all the files in question should be world-writable, this is definitely undesirable. Fortunately the Unix operating system offers a simple solution by making your executable setuid, this means that when it is launched by the WWW server it will be owned by you not nobody.
A binary executable can be made setuid using
the
chmod command with the parameters u+s,go+rx
There are two ways round this problem, the first is to statically link the application, the second is to launch the application via a wrapper that sets both the real and effective user-id. The wrapper itself must, of course, be setuid.