Skip to main content.

Web Based Programming Tutorials

Homepage | Forum - Join the forum to discuss anything related to programming! | Programming Resources

CGI Programming Unleashed

Chapter 8 -- Forms and How Handle Them

Chapter 8

Forms and How Handle Them


CONTENTS






Even if you have never created a World Wide Web form, as a Web user you are probably already familiar with forms. The most popular Web search indexes provide forms to allow the user to customize a search query. Many Web sites request registration information with them. They have even been used to implement multiuser chat lines. Web forms, or HTML forms, are the simplest way to transform a Web page from an on-line brochure into an interactive tool.

What Is an HTML Form?

An HTML form is a section of a Web document into which the user can enter information. This information is passed back to a Web server where it might be recorded in a database for future use or perhaps used to control what information is returned to the user.

Note
An HTML form is a Web page into which a Web user can enter information.

What Can Forms Do?

HTML forms can do the following:

What Can't Forms Do?

HTML forms cannot provide a fully interactive user interface; they can only construct a query or submission to be fetched like any other Web page. There is no way of controlling what is typed into text fields. Forms only prompt the user for information. To handle the information the user enters into the form usually requires the provider to write a CGI-based program designed specifically to process submissions from that form.

Creating HTML Forms

Creating Web forms is no more difficult than authoring other Web documents. Web forms are constructed from HTML mark-up commands or tags. If you use an HTML authoring tool, check in its documentation to find out how to use it to add HTML form tags, or simply edit in the tags described in this chapter, in the section "HTML Form Tags"using a simple text editor.

Note
A tag is an HTML mark-up command in angle brackets, <THUS>.

A Sample HTML Form

In the HTML source, a form must start with a <FORM> tag and end with a </FORM> tag. When you have written a form handler (often a CGI program) to which the data in the form will be sent, you will be able to add an ACTION="url" attribute to specify the location of the handler and a METHOD=reqtype attribute for the submission method to be used. Don't worry about these just yet; their precise meaning will be discussed later in this chapter, in the section "Handling Form Submissions." Listing 8.1 is an example of a simple form.

Note
HTML tags often have attributes; <DIV ALIGN=CENTER> is an HTML DIV tag with an ALIGN attribute of CENTER.


Listing 8.1. A simple Web form.
<FORM ACTION="register.cgi" METHOD=POST>
<DL>
<DT>Your full name:
<DD><INPUT TYPE=TEXT NAME="fullname" SIZE=60 MAXLENGTH=180>
<DT>Your e-mail address: <EM> RFC822 (Internet) format </EM>
<DD><INPUT TYPE=TEXT NAME="email" SIZE=60 MAXLENGH=180>
</DL>
<P> <INPUT TYPE=SUBMIT VALUE="Register"> </P>
</FORM>

Don't be daunted by the number of tags in this example. Compare the HTML listing with Figure 8.1, which shows how the form might look under a particular browser. The <FORM> and </FORM> tags group the input fields together and define how and where they will be submitted. The <DL> and </DL> tags wrap the input fields in an HTML definition list that is used to mark up names or labels (beginning with <DT> for definition term) and their meanings or contents (beginning with <DD> for definition defined). The <P> and </P> tags ensure that the submit button is treated like a separate paragraph.

Figure 8.1: A simple Web form.

Most of the usual HTML mark-up tags, such as <P>, <PRE>, <BR>, and <DL>, are permitted between the <FORM> and </FORM> tags, and can be used to control the layout of the form to some extent. Although the INPUT tags have NAME attributes, these are internal labels not normally seen by the user, so a label for each FORM tag should be included in the Web page. In the previous example, the input field with NAME attribute "fullname" also has some text associated with it in HTML definition list tags.

Tip
Use <DT> and <DD> tags within <DL> and </DL> tags to clearly associate textual labels with form tags. Use <P> and </P> to group form tags, or <BR> to separate them.

HTML Form Tags

Within the <FORM> and </FORM> tags, the following HTML form tags or form components are also available:

INPUT TYPE=TEXT

<INPUT [TYPE=TEXT] NAME="text-id" [SIZE=nn] [MAXLENGTH=nn] [VALUE="default text"]>

INPUT TYPE=SUBMIT

<INPUT TYPE=SUBMIT [NAME="button-id"] [VALUE="Button label text"]>

INPUT TYPE=RESET

<INPUT TYPE=RESET [VALUE="Button label text"]>

INPUT TYPE=RADIO

<INPUT TYPE=RADIO NAME="radio-set-id" VALUE="choice-id" [chECKED]>

INPUT TYPE=chECKBOX

<INPUT TYPE=chECKBOX NAME="box-set-id" VALUE="choice-id" [chECKED]>

INPUT TYPE=IMAGE

<INPUT TYPE=IMAGE NAME="image-id" SRC="image-url" [ALIGN=alignment]>

INPUT TYPE=HIDDEN

<INPUT TYPE=HIDDEN NAME="data-id" VALUE="hidden form data" >

TEXTAREA

<TEXTAREA NAME="text-id" [COLS=nn] [ROWS=nn]>default text</TEXTAREA>

SELECT

<SELECT NAME="select-id" [SIZE=nn] [MULTIPLE]>
<OPTION [VALUE="choice-id"] [SELECTED]>1st choice
<OPTION>2nd choice
<OPTION>...
</SELECT>

Note
Attributes are mandatory unless they are shown here in square brackets ([]). Mandatory attributes must be included for the form to be meaningful. Almost all form tags must have a NAME attribute. The NAME attribute is used as an identifier for the contents of the form component when the form is submitted. The attributes shown previously in square brackets are optional.

INPUT TYPE=TEXT

<INPUT [TYPE=TEXT] NAME="text-id" [SIZE=nn] [MAXLENGTH=nn] [VALUE="default text"]>

An INPUT tag with a TYPE=TEXT attribute presents the user with a prompt for a single line of text. The tag must have a NAME attribute by which it can be identified later. A SIZE attribute can be used to specify how many characters wide the text prompt window should be. A MAXLENGTH attribute can be used to limit the input to a maximum number of characters. If the MAXLENGTH attribute is larger than the SIZE attribute, the browser will usually scroll the entered text appropriately. A VALUE attribute can be used to fill the prompt with some initial text as soon as the form is displayed, which is often referred to as the default text. Listing 8.1, earlier in the chapter, illustrates the use of INPUT TYPE=TEXT.

Caution
Some Web browsers do not honor the MAXLENGTH attribute. Don't rely on the MAXLENGTH value when interpreting form data. It is an advisory limit that most Web browsers implement, but some do not.

INPUT TYPE=SUBMIT

<INPUT TYPE=SUBMIT [NAME="button-id"] [VALUE="Button label text"]>

An INPUT tag with TYPE=SUBMIT provides a button that submits the information in the completed form to the URL given as the ACTION attribute to the <FORM> tag. The information is submitted using the HTTP request type specified by the FORM's METHOD attribute. This is described in more detail in the section "Handling Form Submissions." A form can have more than one SUBMIT button, in which case the buttons can be distinguished by giving a value to the optional NAME attribute. The NAME attribute will be passed in the form data when the form is submitted to allow the form-handling mechanism to determine which submit button the user used. Listing 8.1, earlier in the chapter, is an example of the use of a SUBMIT button.

Tip
If a form consists of only one single INPUT TYPE=TEXT component, pressing the Enter key in the text window will often achieve the same result as pressing the SUBMIT button. Not all Web browsers support this added feature, however, so for maximum coverage and to avoid annoying the user, a form design should always include a SUBMIT button or INPUT TYPE=IMAGE tag.

INPUT TYPE=RESET

<INPUT TYPE=RESET [VALUE="Button label text"]>

An INPUT tag with TYPE=RESET provides a button that clears the form and sets the contents back to their initial values where specified. Not all HTML forms will use this feature, but it can help users start fresh if they want to reconsider the default options. Listing 8.2 in the next section illustrates the use of the reset button.

Note
Unlike most other form tags where the NAME attribute is mandatory, the NAME attribute is optional for the INPUT TYPE=SUBMIT tag and is not used in an INPUT TYPE=RESET tag.

INPUT TYPE=RADIO

<INPUT TYPE=RADIO NAME="radio-set-id" VALUE="choice-id" [chECKED]>

A form can prompt the user to choose from a set of alternatives with INPUT TYPE=RADIO tags. Each tag will be presented to the user as something like a radio button that can be selected. Each radio button in the set of alternatives in a FORM is given the same NAME value. Only one of the radio buttons may be selected at any one time. The INPUT TYPE=RADIO tag has a VALUE attribute that specifies the data sent when the form is submitted if that radio button was selected. Listing 8.2 shows the use of a set of radio buttons as a one-of-many selection. Figure 8.2 shows how the example might appear in a Web browser.

Figure 8.2: A form with radio buttons.


Listing 8.2. A form with radio buttons.
<FORM ACTION="choose.cgi" METHOD=POST>
<P> E-mail address: <INPUT TYPE=TEXT NAME="email" SIZE=60 MAXSIZE=180> </P>
<P> Please add me to the mailing list. </P>
<P> I am: </P>
<OL>
<LI><INPUT TYPE=RADIO NAME="employer" VALUE="private" chECKED>
Employed in the private sector
<LI><INPUT TYPE=RADIO NAME="employer" VALUE="public">
Employed in the public sector
<LI><INPUT TYPE=RADIO NAME="employer" VALUE="self">
Self-employed
<LI><INPUT TYPE=RADIO NAME="employer" VALUE="unemployed">
Unemployed
</OL>
<P> <INPUT TYPE=SUBMIT VALUE="Continue">
<INPUT TYPE=RESET VALUE="Clear form"> </P>
</FORM>

Notice that related radio buttons share the same NAME but different VALUEs. The data sent by the group of radio buttons when the form is submitted is the VALUE attribute of the selected radio button. The chECKED attribute marks a radio button as the default choice, switched "on" when the form is first displayed.

Caution
For radio buttons to behave meaningfully under most Web browsers, there must be at least two with the same NAME attribute. The effect of a single radio button varies between browsers. If you want to display a single switch, use an INPUT TYPE=chECKBOX.

INPUT TYPE=chECKBOX

<INPUT TYPE=chECKBOX NAME="box-set-id" VALUE="choice-id" [chECKED]>

An INPUT tag with attribute TYPE=chECKBOX offers the user an "on" or "off" switch. It is similar to a radio button, but any number of checkboxes may be switched on. If a checkbox is switched "on" when the form is submitted, its VALUE attribute is submitted as the form data for the NAMEd form component. Several checkboxes can be grouped (as with radio buttons) by giving them the same NAME attribute. If several checkboxes with the same NAME are switched on when the form is submitted, the form data for that NAMEd component is the list of switched-on VALUEs separated by commas (","). Listing 8.3 gives an example of checkboxes in use. Figure 8.3 shows how this example might look to the Web user.

Figure 8.3: Checkboxes and a text area..


Listing 8.3. Checkboxes, hidden fields, and a text area.
<FORM ACTION="feedback.cgi" METHOD=POST>
<P> Please tell us what you thought of this Web site. Select the checkboxes which you agree with: </P>
<UL>
<LI><INPUT TYPE=chECKBOX NAME="opinion" VALUE="understandable">
The text was understandable.
<LI><INPUT TYPE=chECKBOX NAME="opinion" VALUE="navigable">
I found it easy to find my way through the Web site.
<LI><INPUT TYPE=chECKBOX NAME="opinion" VALUE="stylish">
I was impressed by the style and presentation.
</UL>
<P> <INPUT TYPE="HIDDEN" NAME="pages" VALUE="brochure">
Please add any other comments:
<TEXTAREA NAME="feedback" ROWS=5 COLS=40>
I think your brochure is:
</TEXTAREA>
</P>
<P> <INPUT TYPE=SUBMIT VALUE="Send comments">
<INPUT TYPE=RESET VALUE="Clear form"> </P>
</FORM>

As with most other form tags, the INPUT TYPE=checkbox must have a NAME attribute, but the designer of the form should still include some label text in the HTML source to accompany each checkbox, and it may be appropriate to precede a list of checkboxes with instructions on their use.

Tip
If you include instructions in a form, steer clear of phrases like "click here" and "mark crosses in the boxes." These are platform-specific or browser-specific instructions. Users with a different Web browser than you may see checkboxes represented as unfilled/filled circles or bracketed asterisks, or whatever is the standard representation of a switch under the graphical user interface they are using. Try to use phrases like "choose," "select," or at worst, "fill in the checkboxes." This guideline applies to instructions for other form components and even for hyperlinks.

INPUT TYPE=IMAGE

<INPUT TYPE=IMAGE NAME="image-id" SRC="image-url" [ALIGN=alignment]>

The INPUT TYPE=IMAGE form tag is similar to the IMG HTML tag. It displays the in-line image from the SRC location with the optional ALIGN tag. It has two features that make it useful in a form. First, it behaves like the submit button. When the user "clicks" on the image, the completed form data is sent to the form handler as described for the INPUT TYPE=SUBMIT component. Second, it allows the user to choose a part of an image. The pixel coordinates of the point at which the user clicked on the image are sent with the form data with the names image-id.x and image-id.y. That is, the horizontal coordinate is sent as contents of the NAME attribute with ".x" added to the end, and the vertical coordinate is sent as the NAME attribute with ".y" added.

Listing 8.4 illustrates the use of INPUT TYPE=IMAGE as a custom submit button, but many sites use the pixel coordinate feature to present graphical menus or tables of buttons or icons or navigation maps. The behavior is similar to that of the imagemap tag <IMG ISMAP>

INPUT TYPE=HIDDEN

<INPUT TYPE=HIDDEN NAME="data-id" VALUE="hidden form data" >

The INPUT TYPE=HIDDEN form tag is unusual in the respect that it does not appear in the displayed form. It is a convenience tool for the form designer. It can be used to hold contextual information, such as the name of the form (useful when the same form handler is used for several different forms) or data the user entered into a previous form (if the form has been generated "on-the-fly" in response to the user input). Its VALUE attribute is passed as the form data for the NAME attribute, but the user cannot see any representation of this form component on the screen and is not prompted to change the form component contents.

In Listing 8.3, an INPUT TYPE=HIDDEN tag is used to identify the form to the form handler, perhaps because the form handler is used in the ACTION attribute of more than one FORM. Some examples of the use of the INPUT TYPE=HIDDEN tag are illustrated in Listings 8.5 and 8.8.

TEXTAREA

<TEXTAREA NAME="text-id" [COLS=nn] [ROWS=nn]>default text</TEXTAREA>

Note
An HTML container is a pair of tags, opened <THUS> and closed </THUS>, whose meaning applies to the text between the tags.

The TEXTAREA tag is not a variant on the INPUT tag; it is an entirely separate HTML tag. It is similar to the INPUT TYPE=TEXT form tag. The TEXTAREA tag presents a multiline text window, with the size specified by the COLS and ROWS attributes. It is an HTML container, like the A HREF="url" tag or the STRONG tag, so a closing </TEXTAREA> tag should always be included after the contained text. The text contained within the <TEXTAREA> and </TEXTAREA> tags appears in the text input window as the default contents. Compare Listing 8.3 with Figure 8.3, which shows the TEXTAREA default contents have not been changed by the user.

Caution
Notice that in most browsers, there is no limit imposed on the amount of text that can be entered into a TEXTAREA. Be prepared to handle a large amount of input to the form handler.

SELECT

<SELECT NAME="text-id" [SIZE=nn] [MULTIPLE]>
<OPTION [VALUE="choice-id-1"] [SELECTED]>1st choice
<OPTION [VALUE="choice-id-2"] [SELECTED]>2nd choice
<OPTION [VALUE="choice-id-etc"] [SELECTED]>...
</SELECT>

The SELECT tag is an alternative to radio buttons or checkboxes that presents a list of choices in a scrolling window. When given the MULTIPLE attribute, it is comparable to checkboxes in the respect that any number of choices can be selected. Without the MULTIPLE attribute, it behaves like radio buttons, and only one choice can be selected at a time. The SIZE attribute can be used to specify the number of choices the form designer would like to be visible and, in effect, controls the size of the scrolling window. The VALUE attribute of each OPTION selected is passed with the form data to the form handler. If the VALUE attribute is omitted, the contents of the option are used instead. As with checkboxes, if more than one option is selected, the VALUE attributes are joined together in a comma-separated list. Listing 8.4 gives an example of the use of SELECT tags, and a possible browser representation of the example is shown in Figure 8.4.

Figure 8.4: Selection boxes and clickable images..


Listing 8.4. Selection boxes and clickable images.
<FORM ACTION="select.cgi" METHOD=GET>
<H2>
Choose which software to download:
</H2>
<P>
<SELECT NAME="package" SIZE=3>
<OPTION VALUE="text" SELECTED>Text viewer
<OPTION VALUE="image">Image viewer
<OPTION VALUE="movie">Movie player
<OPTION VALUE="audio">Sound player
<OPTION VALUE="editor">Media editor
</SELECT>
<SELECT NAME="platform">
<OPTION>IBM pc compatible
<OPTION>Macintosh (68000)
<OPTION>Macintosh (Power pc)
</SELECT>
<SELECT NAME="options" MULTIPLE>
<OPTION>License
<OPTION>Media
<OPTION>Documentation
</SELECT>
</P>
<P>
<INPUT TYPE="IMAGE" NAME="coords" SRC="download.gif">
</P>
</FORM>

Caution
Certain browsers implement a SELECT form component with no MULTIPLE attribute or no SIZE attribute as a drop-down menu. However, drop-down menus can cause problems. On some platforms, a long drop-down menu can exceed the size of the screen, which renders some of the choices unusable. If you wish to constrain the user to one of more than, say, 12 choices, use radio buttons, or at least specify a SIZE attribute.

Future FORM HTML Tags Proposed in the "Draft HTML 3.0 Spec"

Before the announcement of HTML 3.2 (the new standard for Web hypertext-also known as Wilbur), some ideas for a future version of HTML were outlined in a consultation document informally referred to as the "draft HTML 3.0 specification," including the introduction of three new form tags. <INPUT TYPE="audio"> would allow for the submission of voice or sounds recorded by the form user. <INPUT TYPE="scribble"> would allow the user to submit a free-hand sketch with the form. <INPUT TYPE="file"> would prompt for a filename to be uploaded with the form data. To use these components in a form would require the form designer to add an attribute to the FORM tag, probably EncTYPE="multi-part/form-data". The EncTYPE attribute is not documented here, but the default encoding for form submissions "application/x-www-form-urlencoded" is described in the section "Decoding + and %hh (URL-Encoding)." If you are interested in new developments in the HTML standard, you can find out more from the W3 Consortium Web site at

http://www.w3.org/

Form Style

With a little care, you can design forms that are simple, clear, and easy to use. While it can be tempting to use every different FORM tag and feature, the user will be more impressed by a form that is easy to use than one that is rich in features. The first priority must be to make the form usable. This will encourage the user to take the time to fill in the form and open a channel of communication between the provider and the user.

Use the following list as a guide to form style:

Instructions should be appropriate to the intended audience and should not insult the user's intelligence. They are primarily there to invite the user to take the trouble to fill in the form. The instruction "Please select the product that interests you" is likely to get a better response than "Click in the right boxes and press the continue button."

A textual label is not part of a form tag but is included in the HTML source of the form alongside the tag. Use HTML mark-up tags to group the label with the form component. Don't rely on the form looking the same in other browsers and on other screens as it does on yours. Use tags such as <P>, the paragraph tag

<P>Your full name: <INPUT NAME="fullname"></P>

to ensure that the label and the entry box appear together on the screen but is distinct from other form components.

Wherever there is an appropriate default option, offer it as default text in the VALUE of an INPUT TYPE=TEXT tag, between <TEXTAREA> and </TEXTAREA> tags or as the SELECTED or chECKED option in a list.

To make life easier for both the provider and the user, use checkboxes, radio buttons, and SELECT lists whenever there are a limited number of possible options, especially when you plan to process the form submission contents automatically.

Avoid making your form functionally dependent on features that are not available in the majority of Web browsers. Some newer browsers support "mailto:" URLs as FORM ACTIONs, but many of the current generation of Web browsers cope badly or fail completely to send form submissions as mail messages. This may not be a problem where the form is for internal use within an organization that has standardized on a full-featured Web browser, but forms made available to the world should ideally be handled by an HTTP server using, for example, a CGI program.

In some cases, a Web form may not be the appropriate solution to a problem. If the form consists solely of a set of radio buttons, perhaps a list of <A HREF="url"> hyperlinks would have been more appropriate. If the form is nothing more than one INPUT TYPE=IMAGE component, would an <IMG ISMAP> have been simpler? Even the poorly regarded <ISINDEX> tag might be a simpler option than a search form employing a single INPUT TYPE=TEXT prompt.

A Sample Form

The definitions of the various FORM tags shown previously are accompanied by simple examples of HTML forms. Listing 8.5 is a more realistic example of an HTML form.


Listing 8.5. Comments form.
<FORM ACTION="comments.cgi" METHOD=POST>
<P> In order that we may continue to provide a high quality World Wide Web service,
please take the time to fill in this form.</P>
<DL>
<DT>Your surname (family name):
<DD><INPUT NAME="surname" SIZE=20 MAXLENGTH=60>
<DT>Your first name (given name):
<DD><INPUT NAME="forename" SIZE=20 MAXLENGTH=60>
</DL>
<P>Your title:
[<INPUT TYPE=RADIO NAME="title" VALUE="Mr"> Mr.]
[<INPUT TYPE=RADIO NAME="title" VALUE="Ms"> Ms.]
[<INPUT TYPE=RADIO NAME="title" VALUE="Mrs"> Mrs.]
[<INPUT TYPE=RADIO NAME="title" VALUE="Miss"> Miss.]
[<INPUT TYPE=RADIO NAME="title" VALUE="Dr"> Dr.]
[<INPUT TYPE=RADIO NAME="title" VALUE=""> Other.] </P>
<P> Please tell us what you thought of this Web site. Select the checkboxes which you agree with: </P>
<UL>
<LI><INPUT TYPE=chECKBOX NAME="opinion" VALUE="understandable">
The text was understandable.
<LI><INPUT TYPE=chECKBOX NAME="opinion" VALUE="navigable">
I found it easy to find my way through the Web site.
<LI><INPUT TYPE=chECKBOX NAME="opinion" VALUE="stylish">
I was impressed by the style and presentation.
</UL>
<P> <INPUT TYPE="HIDDEN" NAME="pages" VALUE="brochure">
Please add any other comments:
<TEXTAREA NAME="feedback" ROWS=5 COLS=40>
I think your brochure is:
</TEXTAREA>
</P>
<P> <INPUT TYPE=SUBMIT VALUE="Send comments">
<INPUT TYPE=RESET VALUE="Clear form"> </P
>
</FORM>

Other sample forms can be found at Web sites all over the world. Simply use the "View Source" facility of your browser to see the HTML tags that make up a form you find.

Handling Form Submissions

Handling Web form data submissions using CGI can be conveniently divided in to two procedures. First, the CGI program must accept the submitted form data from the Web server (the HTTP server). The algorithm for this procedure depends on the choice of submission method. In CGI, the submission method is given by the environment variable REQUEST_TYPE and will be either GET or POST. Second, the CGI program must decode the submitted data before it can be used by other procedures.

REQUEST_TYPEs GET versus POST

The names of the two REQUEST_TYPEs GET and POST correspond to the HTTP mechanism used in each case. They have different characteristics and so need different treatment.

Differences between GET and POST

A GET form submission is performed by fetching a URL made up of

For example, the default values in the form in Listing 8.4 might be submitted as a GET of the URL

http://www.site.com/cgi-bin/select.cgi?package=text&coords=19,11

Because the form data is included as part of the URL in a "GET" request, the reply to the form submission is likely to be remembered by caching browsers and caching HTTP proxy servers. For this reason, it is advisable to use GET as the METHOD attribute of forms that always yield the same result for the same input and that have no side effects, such as index search forms.

A POST form submission is performed by sending the form data encoded as described in the section "Decoding + and %hh (URL-Encoding)" as a document with the HTTP header

Content-type: application/x-www-form-urlencoded

to the server giving the location of the CGI form handler program as the URL.

For example, a submission from the form in Listing 8.1 might reach the Web server as this HTTP transaction:

POST http://www.site.com HTTP/1.0
User-agent: Pip/2.53
Content-type: application/x-www-form-urlencoded
Content-length: 50

fullname=Charles+Dickens&email=dickens%40literary.org.uk

Because this operation is easily distinguished from an ordinary URL fetch and the form data is in an attached document rather than part of the URL, this sort of request is rarely remembered by caching proxy servers. POST should be used as the METHOD attribute of forms that are intended to have a useful side effect every time they are submitted, or that can produce different results for the same input.

Accepting GET Type Requests

From within a CGI program, form data submitted with the FORM METHOD attribute GET is encoded in the environment variable QUERY_STRING. From a C program, this could be referenced by using a call to the getenv library routine, as shown in Listing 8.6.


Listing 8.6. CGI environment variables in C.
encodedQuery=getenv("QUERY_STRING");

In a Perl script, it is available in the %ENV associative array:

$encodedQuery=$ENV{"QUERY_STRING"};

Because the form data is submitted in an environment variable by way of a URL, GET form submissions are subject to smaller system defined size limits. The GET method should not be used for forms where the expected data will be very large.

Accepting POST Type Requests

From the point of view of a CGI program, form data submitted with the POST method is an encoded stream of characters in the standard input of the program. If a Content-length: header was supplied by the form browser, the number of characters that need to be read from the standard input is available to the CGI program in the CONTENT_LENGTH environment variable. No more than CONTENT_LENGTH characters should be read in. In the event that there is no CONTENT_LENGTH environment variable, the CGI program should read characters from standard input until an "end-of-file" or other exception occurs. In a Perl script, POSTed form data would be retrieved with a procedure similar to Listing 8.7.


Listing 8.7. Accepting POST type requests in Perl.
$encodedQuery="";    # The encoded form data will be appended to this string
$charsRemaining=65536;    # This CGI program will truncate the encoded form data after 64Kbytes
            # unless the Content-length: is specified.
$charsRemaining=$ENV{"CONTENT_LENGTH} if $ENV{"CONTENT_LENGTH"};
while ($charsRemaining--) {
    $encodedQuery.=getc;
}

Accepting ISINDEX Requests

Although not strictly forms, queries from <ISINDEX> Web pages behave very similarly to FORM METHOD=GET submissions. The only difference is that the "Separating data from different form components" step of the form data decoding procedure is not required.

Form Data Decoding

Form submissions, whether they are sent using GET or POST, have the same encoding and can be unpacked using the same procedures.

Separating Data from Different Form Components

Except in the case of <ISINDEX> queries, the encoded submission is made up of pairs of input component NAME with input component contents joined by an equal sign (=) and delimited by ampersands (&). For example, the default contents of the form in Listing 8.8 would be encoded as shown.

comments=None&choice=Good&where=HomePage

Listing 8.8. Separating data from different form components in Perl.
<FORM ACTION="test.cgi" METHOD=POST>
<P>Comments: <TEXTAREA NAME="comments" ROWS=3 COLS=40>None</TEXTAREA></P>
<SELECT NAME="choice" SIZE=4>
<OPTION>Excellent
<OPTION SELECTED>Good
<OPTION>Average
<OPTION>Poor
</SELECT>
<INPUT TYPE=HIDDEN NAME="where" VALUE="HomePage">
</FORM>

The Perl procedure in Listing 8.9 will fill an associative array with the contents of each form component listed by the form component NAME attribute.


Listing 8.9. Filling a Perl associative array with form data.
foreach (split("&", $encodedQuery)) {
    ($name,$contents) = split("=");
    $encodedForm{$name}=$contents;
}

Decoding + and %hh "URL-Encoding"

To allow arbitrary characters with special meanings like spaces, ampersands, and equal signs to be passed as form data, any characters that are likely to cause trouble are translated by the Web browser to a safe alternative. Space characters ( ) are converted to plus signs (+), and other special characters such as ampersands (&) or percent characters (%) are replaced by a sequence

%hh

where hh is the hexadecimal representation of the numeric code for the character replaced. This encoding is the same as that used for URLs during HTTP transactions and is hence referred to as "application/x-www-form-urlencoded." Before the form data is passed to other procedures, this encoding must be reversed. The Perl procedure in Listing 8.10 is an improvement on the procedure in Listing 8.9, with code added to reverse the URL-encoding.


Listing 8.10. Separating and decoding Form data in Perl.
foreach (split("&", $encodedQuery)) {
     ($name,$contents) = split("=");
    $form{$name}=$contents;
    $form{$name}=~s/\+/ /g;
    $form{$name}=~s/%(..)/pack("c",hex($1))/ge;
}

Basic Data Validation

It makes sense to check that the data passed from the form is suitable for the intended purpose before continuing. The CGI program could, for instance, output another HTML form re-prompting the user for the information with an explanation of why the first submission was not acceptable. Basic data checks could ensure that a text box intended for a positive whole number contained only digits, that a text box prompting for an Internet e-mail address contained an at sign (@) and no spaces or that a SELECT form component returned at least one option from the genuine list of choices.

If the form data is to be passed to another application or to library routines, the data should be stripped of any characters that aren't strictly necessary. In particular, any characters that have special meanings to interpreters should be carefully handled. For instance, in a UNIX Bourne shell script, it is difficult to manipulate an environment variable or pass a variable to another program without re-evaluating the contents of the environment variable. In an <ISINDEX> handler script the command

QUERY_STRING='/bin/env | /bin/grep '^QUERY_STRING=' | \
/bin/sed -e 's/QUERY_STRING=//' -e 's/[^A-Za-z ]//g'`

will remove all characters other than letters and spaces from the user input before it is reinterpreted.

Not only will this kind of "paranoia" remove a potential logic flaw in the CGI form handler, it will also increase the security of the Web server. For more discussion of these concerns, please read Chapter 9, "Security."

Choosing the Programming Language

CGI is a platform-independent interface definition. The actual choice of which programming language to use is left to the programmer. Any programming language available on the Web server platform that includes access to environment variables can be used for writing CGI form handlers.

Pros and Cons

The class of programming tools characterized as high-level, interpreted, "scripting" languages include UNIX command shell languages, DOS batch command files, Perl scripts, and Visual Basic programs. These typically have the following benefits:

but can introduce the following costs:

The last of these can be a problem if the CGI program is to be widely distributed, or if the Web server can be fooled into delivering the text of the CGI program to a Web browser, as any security holes are more easily discovered by system crackers.

Programming tools that compile lower-level source code such as Pascal, C, and C++ reverse the pattern of pros and cons. These tend to provide

but may entail

One can easily conclude that one of the former programming languages might be appropriate for a short-term solution during development of a tool restricted to a single organization, but the investment of time in one of the latter programming languages might be more appropriate to a simple but frequently used general Web form handler.

A Sample CGI Form Handler Program

Let's employ the techniques learned to write a CGI form handler program, as shown in Listing 8.11, for the sample form given in Listing 8.5. We will accept and acknowledge POSTed form data (which we assume is coming from the sample form) and write it to a text file: /var/adm/www/comments.log.


Listing 8.11. A form handler for Visitor's book/Comments form: comments.cgi.
#!/usr/local/bin/perl
# Handle comment form submissions
# Form fields: surname,forename,title,opinion,pages,feedback

$encodedQuery="";    # The encoded form data will be appended to this string
$charsRemaining=102400;# This CGI program will truncate the encoded form data after 100Kbytes
            # unless the Content-length: is specified.
$charsRemaining=$ENV{"CONTENT_LENGTH"} if $ENV{"CONTENT_LENGTH"};
while ($charsRemaining--) {
    $encodedQuery.=getc;
}

foreach (split("&", $encodedQuery)) {
    ($name,$contents) = split("=");
    $form{$name}=$contents;
    $form{$name}=~s/\+/ /g;
    $form{$name}=~s/%(..)/pack("c",hex($1))/ge;
}

print "Content-type: text/html\n\n";    # Generate HTTP header
print "<HTML><HEAD><TITLE>Thank-you</TITLE></HEAD>\n";
print "<BODY><H1>Thank-you</H1>\n";
$safename=$form{"title"}." ".$form{"surname"};
$safename=~s/[^\w ]/ /g;            # Excise any HTML special characters
$safepages=$form{"pages"};
$safepages=~s/[^\w ]/ /g;            # Excise any HTML special characters
print "<P>Thank-you for submitting your comments on ".$safepages.", ".$safename."
Â<P>\n";
print "<HR>\n";
print '<P><A HREF="/">Return to home page</A></P>';
print "\n</BODY></HTML>";

if (open(LOGFILE, ">>/var/adm/www/comments.log")) {
    foreach (keys %form) {
        print LOGFILE $_.":\n".$form{$_}."\n";
    }
    close(LOGFILE);
}

Forms-Based Intranet/Internet Client/Server Applications

If your organization plans to use Web forms for major applications-either publicly available or restricted to a LAN-it would be well worth developing and standardizing on a library of CGI and Web form routines. Standard procedures can be designed not only to decode form submissions, but also to generate forms "on-the-fly."

What Forms Can and Can't Do

Web forms do not provide the rich set of user interface objects available in system-specific GUI toolkits. They do not provide instant feedback or a high-level of control on allowable input.

Web forms do, however, allow platform-independent development of generic input clients for network applications. Web browsers that support Web forms are available for the most popular client platforms. In fact, your intended user probably already has a forms-capable client on his or her desktop.

Automatically Generated Forms

Rather than designing a different Web form for every possible situation, a programmer can design a CGI application to automatically generate HTML forms by describing the data types to be prompted for in a machine-readable representation, and choosing a template HTML form tag that is appropriate to each data type.

Numbers can be prompted for using the INPUT TYPE=TEXT tag, Boolean choices using radio buttons, or SELECT tags and textual data using the TEXTAREA tag. These can be automatically sized as needed using their respective tag attributes. A truly object-orientated design would implement a "Web forms" interface to all objects. This interface would include methods to generate HTML form tags that prompt for the contents of an object, methods to validate the contents of the form submission, and methods to help and re-prompt the user when the form data submitted is not suitable.

Partially Prefilled Forms

Library routines that generate Web form tags should include the capability to supply a default value to the user. The contents of TEXTAREA tags and the VALUE, chECKED, and SELECTED attributes described previously provide several ways to supply default input. This capability is not only a way to suggest appropriate responses, it can also be used when the user is re-editing or changing existing data.

Tip
Not all existing data can be offered as a default entry in all form tags. The INPUT TYPE=TEXT tag will accommodate only default VALUEs, which can be expressed as an HTML attribute. Quote characters ("), greater-than characters (>), and line-breaks all cause problems if they are used as default text in INPUT TYPE=TEXT tags. Often, the TEXTAREA tag comes to the rescue with its "container" syntax described previously. Also, if it is possible that existing data does not conform to the current set of options in a SELECT or similar form tag, an "Other" option accompanied by a text field can save the day.

Forms Ready Reference

The following is a summary of Web forms for reference:

<FORM ACTION="url" METHOD=reqtype >
    <INPUT [TYPE=TEXT] NAME="id" [SIZE=nn] [MAXLENGTH=nn] [VALUE="default"]>
    <INPUT TYPE=SUBMIT [NAME="button-id"] [VALUE="Button label text"]>
    <INPUT TYPE=RESET [VALUE="Button label text"]>
    <INPUT TYPE=RADIO NAME="radio-set-id" VALUE="choice-id" [chECKED]>
    <INPUT TYPE=chECKBOX NAME="box-set-id" VALUE="choice-id" [chECKED]>
    <INPUT TYPE=IMAGE NAME="image-id" SRC="image-url" [ALIGN=alignment]>
    <INPUT TYPE=HIDDEN NAME="data-id" VALUE="hidden form data" >
    <TEXTAREA NAME="text-id" [COLS=nn] [ROWS=nn]>
        default text
    </TEXTAREA>
    <SELECT NAME="select-id" [SIZE=nn] [MULTIPLE]>
        <OPTION [VALUE="choice-id"] [SELECTED]>1st choice
        <OPTION>2nd choice
        <OPTION>...
    </SELECT>
</FORM>

Brief Outline of GET and POST Mechanisms

Brief Outline of Form Encoding

Form contents are encoded by replacing spaces (" ") with plus signs ("+"), and other unsafe characters are represented by the hexadecimal escape sequence %hh. Form data is associated with NAMEd tags using id=data pairs separated by ampersands ("&").

Summary

HTML forms provide a simple way for a Web browser user to supply information to your Web site, to search efficiently for information, and to interact with Internet information gateways.

A well-designed form backed by an effective form handler can give visitors interactive control over your site and provide you with valuable information about your clients.

Of course, in form handlers and all other CGI applications, security is an important factor and is covered in the next chapter.