Non-refundable c board cgi. Directors' mobile phones

All tips

Thanks to the World Wide Web, almost anyone can provide information online in a form that is easy on the eyes and can be widely disseminated. You've no doubt surfed the Internet and seen other sites, and now you probably know that scary acronyms like "HTTP" and "HTML" are simply shorthand for "Web" and "the way information is expressed on the Internet." You may already have some experience presenting information on the Internet.

The Internet has proven to be an ideal medium for distributing information, as can be seen from its enormous popularity and widespread development. Although some have questioned the usefulness of the Internet and attribute its widespread development and popularity mainly to intrusive advertising, the Internet is undeniably an important medium for presenting all kinds of information. Not only are there many services to provide the latest information (news, weather, live sporting events) and reference materials in in electronic format, significant amounts of other types of data are also offered. The IRS, which distributed all of its 1995 tax return forms and other information via the World Wide Web, recently admitted to receiving fan mail for its Web site. Who would have thought that the IRS would ever receive fan mail? This happened not because his site was well designed, but because it turned out to be truly useful tool for thousands, perhaps millions of people.

What makes the Web unique and such an attractive information service? First of all, it provides a hypermedia interface to data. Think about your computer's hard drive. Typically, data is expressed in linear form, similar to file system. For example, you have a number of folders, and inside each folder there are either documents or other folders. The web uses a different paradigm to express information called a hypermedia. A hypertext interface consists of a document and links. Links are words that are clicked to see other documents or find other types of information. The Web expands the concept of hypertext to include other types of media, such as graphics, sounds, video (hence the name "hypermedia"). Selecting text or graphics on a document allows you to see related information about the selected item in any number of forms.

Almost everyone can benefit from this simple and unique way of presenting and distributing information, from academics who want to immediately use data with their colleagues to business people who share information about their company with everyone. However, although it is extremely important to provide information, in the last few years many have felt that no less important process is to obtain information.

Although the Web provides a unique hypermedia interface for information, there are many other effective ways to distribute data. For example, network services such as File Transfer Protocol (FTP) and the Gopher newsgroup existed long before the advent of World Wide Web. Electronic mail has been the primary medium for communication and information exchange on the Internet and most other networks almost from the very beginning of these networks. Why has the Internet become such a popular way of distributing information? The multimedia aspect of the Internet has contributed significantly to its unprecedented success, but for the Internet to be most effective, it must be interactive.

Without the ability to receive user input and provide information, the Web would be a completely static environment. The information would only be available in the format specified by the author. This would undermine one of the capabilities of computing in general: interactive information. For example, rather than forcing the user to view multiple documents as if he or she were looking through a book or dictionary, it would be better to allow the user to identify keywords on a topic of interest. Users can customize the presentation of data rather than relying on a rigid structure defined by the content provider.

The term "Web server" can be misleading because it can refer to both the physical machine and the software it uses to communicate with Internet browsers. When a browser requests a given Web address, it first connects to the machine via the Internet, sending the Web server software a request for the document. This software runs continuously, waiting for such requests to arrive and responding accordingly.

Although servers can send and receive data, the server itself has limited functionality. For example, the most primitive server can only send the required file to the browser. The server usually does not know what to do with this or that additional input. If the ISP does not tell the server how to handle such Additional information, most likely the server will ignore the input.

In order for the server to be able to perform other operations besides searching and sending files to the Internet browser, you need to know how to expand the functionality of the server. For example, a Web server cannot search a database based on a keyword entered by a user and return multiple matching documents unless such a capability has been programmed into the server in some way.

What is CGI?

The Common Gateway Interface (CGI) is an interface to the server that allows you to extend the functionality of the server. Using CGI, you can interact interactively with users who access your site. At a theoretical level, CGI allows the server to be able to parse (interpret) input from the browser and return information based on the user's input. On a practical level, CGI is an interface that allows a programmer to write programs that communicate easily with a server.

Typically, to expand the server's capabilities, you would have to modify the server yourself. This solution is undesirable because it requires understanding the lower layer of Internet Protocol network programming. This would also require editing and recompiling the server source code or writing a custom server for each task. Let's say you want to extend the server's capabilities so that it acts as a Web-to-e-mail gateway, taking user-entered information from the browser and emailing it to another user. The server would have to insert code to parse the input from the browser, forward it via email to the other user, and forward the response back to the browser over the network connection.

Firstly, such a task requires access to the server code, which is not always possible.

Secondly, it is difficult and requires extensive technical knowledge.

Third, this only applies to a specific server. If you need to move your server to another platform, you will have to run or at least spend a lot of time porting code to that platform.

Why CGI?

CGI offers a portable and simple solution to these problems. The CGI protocol defines standard way for programs to contact the Web server. Without any special knowledge, you can write a program in any machine language that interfaces and communicates with the Web server. This program will work with all Web servers that understand the CGI protocol.

CGI communication is done using standard input and output, which means that if you know how to print and read data using your programming language, you can write a Web server application. Apart from parsing input and output, programming CGI applications is almost equivalent to programming any other application. For example, to program the "Hello, World!" program, you use your language's print functions and the format defined for CGI programs to print the corresponding message.

Selecting a programming language

Because CGI is a universal interface, you are not limited to any specific machine language. An important question that is often asked is: what programming languages can be used for CGI programming? You can use any language that allows you to do the following:

Print to standard output
Read from standard input
Read from variable modes

Almost all programming languages and many scripting languages do these three things, and you can use any of them.

Languages fall into one of the following two classes: translated and interpreted. A translated language such as C or C++ is usually smaller and faster, while interpreted languages such as Perl or Rexx sometimes require a large interpreter to be loaded upon startup. Additionally, you can distribute binary codes (code that translates into machine language) without source code if your language is translatable. Distributing interpretable scripts usually means distributing source code.

Before choosing a language, you first need to consider your priorities. You need to weigh the benefits of the speed and efficiency of one programming language against the ease of programming of another. If you have a desire to learn another language, instead of using the one you already know, carefully weigh the advantages and disadvantages of both languages.

The two most commonly used languages for CGI programming are C and Perl (both of which are covered in this book). Both have clear advantages and disadvantages. Perl is a very high-level language, and at the same time a powerful language, especially suitable for parsing text. Although ease of use, flexibility, and power make it an attractive language for CGI programming, its relatively large size and more slow work sometimes makes it unsuitable for some applications. C programs are smaller, more efficient, and provide lower-level system control, but are more complex to program, do not have lightweight built-in text processing routines, and are more difficult to debug.

Which language is most suitable for CGI programming? The one that you consider more convenient from a programming point of view. Both are equally effective for programming CGI applications, and with the proper libraries, both have similar capabilities. However, if you have a hard-to-reach server, you can use smaller, translated C programs. If you need to quickly write an application that requires a lot of text processing work, you can use Perl instead.

Cautions

There are some important alternatives to CGI applications. Many servers now include API programming, which makes it easier to program direct server extensions as opposed to standalone CGI applications. API servers are generally more efficient than CGI programs. Other servers include built-in functionality that can handle special non-CGI elements, such as database linking. Finally, some applications can be handled by some new client-side (rather than server-side) technologies like Java. With such rapid changes in technology, will CGI quickly become obsolete?

Hardly. CGI has several advantages over newer technologies.

It is versatile and portable. You can write a CGI application using almost any programming language on any platform. Some of the alternatives, such as the server API, limit you to certain languages and are much more difficult to learn.
It is unlikely that client-side technologies such as Java will replace CGI, because there are some applications that server-side applications are much better suited to run.
Many of the limitations of CGI are limitations of HTML or HTTP. As Internet standards as a whole evolve, so do CGI capabilities.

Summary

The Common Gateway Interface is the protocol by which programs interact with Web servers. The versatility of CGI gives programmers the ability to write gateway programs in almost any language, although there are many trade-offs associated with different languages. Without this ability, creating interactive Web pages would be difficult, at best requiring server modifications, and interactivity would be unavailable to most users who are not site administrators.

Chapter 2: Basics

Several years ago, I created a page for a college at Harvard where people could submit comments about them. At the time, the Internet was young and documentation was scarce. I, like many others, relied on short documentation and programming systems created by others to teach myself CGI programming. Although this method of study required some searching, many experiments, and created many questions, it was very effective. This chapter is the result of my early work with CGI (with a few tweaks, of course).

Although it takes some time to fully understand and master the common gateway interface, the protocol itself is quite simple. Anyone who has some basic programming skills and is familiar with the Web can quickly learn to program fairly complex CGI applications just as I and others learned to do several years ago.

The purpose of this chapter is to present the basics of CGI in a comprehensive, albeit condensed, way. Each concept discussed here is presented in detail in subsequent chapters. However, after completing this chapter, you can immediately begin programming CGI applications. Once you reach this level, you can learn the intricacies of CGI, either by reading the rest of this book or simply experimenting on your own.

You can boil down CGI programming to two tasks: receiving information from the Web browser and sending information back to the browser. This is done quite intuitively once you become familiar with the normal use of CGI applications. Often the user is asked to fill out some form, for example, insert his name. Once the user fills out the form and presses Enter, this information is sent to the CGI program. The CGI program must then convert this information into something it understands, process it accordingly, and then send it back to the browser, be it a simple confirmation or the result of a search in a multi-purpose database.

In other words, programming CGI requires understanding how to receive input from the Internet browser and how to send output back. What happens between the input and output stages of a CGI program depends on the developer's goal. You'll find that the main difficulty in CGI programming lies in this intermediate stage; Once you learn how to work with input and output, that is essentially enough to become a CGI developer.

In this chapter, you'll learn the principles behind CGI input and output, as well as other basic skills you'll need to write and use CGI, including things like creating HTML forms and naming your CGI programs. This chapter covers the following topics:

Traditional program "Hello, World!";
CGI Output: Sending information back for display in an Internet browser;
Configuring, installing, and running the application. You will learn about different Web platforms and servers;
CGI Input: Interpretation of information sent by the Web browser. Introduction to some useful programming libraries for parsing such input;
A simple example: it covers all the lessons in a given chapter;
Programming strategy.

Due to the nature of this chapter, I only touch lightly on some topics. Don't worry; All of these topics are covered in much more depth in other chapters.

Hello, World!

You start with a traditional introductory programming problem. You will write a program that displays "Hello, World!" on your Web browser. Before you write this program, you must understand what information the Web browser expects to receive from CGI programs. You also need to know how to run this program so you can see it in action.

CGI is language independent, so you can implement this program in any language. Several different languages are used here to demonstrate the independence of each language. IN Perl language, the program "Hello, World!" shown in Listing 2.1.

Listing 2.1. Hello, World! in Perl. #!/usr/local/bin/perl # Hello.cgi - My first CGI program print "Content-Type: text/html\n\n"; print " \n"; print " Hello, World!"; print "\n"; print " \n"; print "

Hello, World!

\n"; print " \n";

Save this program as hello.cgi, and install it in the appropriate location. (If you're not sure where it is, don't worry; you'll find out in the "Installing and Running a CGI Program" section later in this chapter.) For most servers, the directory you need is cgi-bin. Now, call the program from your Web browser. For most, this means opening the following uniform resource locator (URL):

http://hostname/directoryname/hello.cgi

Hostname is the name of your Web server, and directoryname is the directory where you put hello.cgi (probably cgi-bin).

Splitting hello.cgi

There are a few things to note about hello.cgi.

First, you use simple print commands. CGI programs do not require any special file descriptors or output descriptors. To send output to the browser, simply print to stdout.

Second, note that the content of the first print statement (Content-Type: text/html) does not appear on your Web browser. You can send any information you want back to the browser (HTML page, graphics or sound), but first, you need to tell the browser what kind of data you are sending it. This line tells the browser what kind of information to expect - in this case, an HTML page.

Thirdly, the program is called hello.cgi. You don't always need to use the .cgi extension with the name of your CGI program. Although source for many languages also uses the .cgi extension, it is not used to indicate the type of language, but is a way for the server to identify the file as an executable file and not a graphic file, HTML file or text file. Servers are often configured to only attempt to execute those files that have this extension, displaying the contents of all others. Although using the .cgi extension is not required, it is still considered good practice.

In general, hello.cgi consists of two main parts:

tells the browser what information to expect (Content-Type: text/html)
tells the browser what to display (Hello, World!)

Hello, World! in C

To show the language independence of CGI programs, Listing 2.2 shows the equivalent of the hello.cgi program written in C.

Listing 2.2. Hello, World! in C. /* hello.cgi.c - Hello, World CGI */ #include int main() ( printf("Content-Type: text/html\r\n\r\n"); printf(" \n"); printf(" Hello, World!\n"); printf("\n"); printf(" \n"); printf("

Hello, World!

\n"); printf(" \n"); )

Note

Note that the Perl version of hello.cgi uses Content-Type print ": text/html\n\n "; While version C uses Printf("Content-Type: text/html\r\n\r\n");

Why does Perl print the operator end with two characters new line(\n) while C printf ends with two carriage returns and newlines (\r\n)?

Technically, headers (all output before the blank line) are expected to be separated by carriage returns and newlines. Unfortunately, on DOS and Windows machines, Perl translates \r as another newline rather than as a carriage return.

Although Perl's \rs exception is technically incorrect, it will work on almost all protocols and is equally portable across all platforms. Therefore, in all the Perl examples in this book, I use newline separating headers rather than carriage returns and newlines.

An appropriate solution to this problem is presented in Chapter 4, Conclusion.

Neither the Web server nor the browser cares what language is used to write the program. Although each language has advantages and disadvantages as a CGI programming language, it is best to use the language that you are most comfortable working with. (The choice of programming language is discussed in more detail in Chapter 1, “Common Gateway Interface (CGI)”).

CGI rendering

Now you can take a closer look at the issue of sending information to the Web browser. From the "Hello, World!" example, you can see that Web browsers expect two sets of data: a header, which contains information such as what information to display (eg Content-Type: line) and actual information (what the Web browser displays). These two pieces of information are separated by a blank line.

The header is called the HTTP header. It gives important information about the information that the browser is going to receive. There are several different types of HTTP headers, and the most common is the one you've used before: Content-Type: header. You can use different combinations of HTTP headers, separated by carriage returns and newlines (\r\n). The blank line separating the header from the data also consists of a carriage return and a newline (why both are needed is briefly discussed in the preceding note and detailed in Chapter 4). You'll learn about other HTTP headers in Chapter 4; Currently you are dealing with Content-Type: header.

Content-Type: The header describes the type of data that the CGI returns. The appropriate format for this header is:

Content-Type: subtype/type

Where subtype/type is the correct Multipurpose Internet Mail Extensions (MIME) type. The most common MIME type is the HTML type: text/html. Table 2.1 lists a few more common MIME types that will be discussed; A more complete listing and analysis of MIME types is provided in Chapter 4.

Note

MIME was originally invented to describe the contents of mail message bodies. It has become a fairly common way to represent Content-Type information. You can read more about MIME in RFC1521. RFCs on the Internet stand for Requests for Comments, which are summaries of decisions made by groups on the Internet trying to set standards. You can view the results of RFC1521 at the following address: http://andrew2.andrew.cmu.edu/rfc/rfc1521.html

Table 2.1. Some common MIME types. MIME Type Description Text/html Hypertext Markup Language (HTML) Text/plain Plain text files Image/gif Graphic files GIF Image/jpeg Compressed graphic files JPEG Audio/basic Audio - Sun *.au Audio/x-wav files Windows files*.wav

After the header and an empty line, you simply print the data in the form you need. If you are sending HTML, then print HTML tags and data to stdout after the header. You can also send graphics, sound and other binary files by simply printing the contents of the file to stdout. Several examples of this are given in Chapter 4.

Installing and Running a CGI Program

This section deviates somewhat from CGI programming and talks about configuring your Web server to use CGI, installing and running programs. You'll be introduced to different servers for different platforms in more or less detail, but you'll have to dig deeper into your server's documentation to find the best option.

All servers require space for server files and space for HTML documents. In this book, the server area is called ServerRoot, and the document area is called DocumentRoot. On UNIX machines, ServerRoot is usually in /usr/local/etc/httpd/, and DocumentRoot is usually in /usr/local/etc/httpd/htdocs/. However, this will not make any difference to your system, so replace all references to ServerRoot and DocumentRoot with your own ServerRoot and DocumentRoot.

When you access files using your Web browser, you specify the file in the URL relative to the DocumentRoot. For example, if your server address is mymachine.org, then you access this file with the following URL: http://mymachine.org/index.html

Configuring the server for CGI

Most Web servers are pre-configured to allow the use of CGI programs. Typically two parameters indicate to the server whether the file is a CGI application or not:

Designated directory. Some servers allow you to determine that all files in a designated directory (usually called cgi-bin by default) are CGI.
File name extensions. Many servers have this pre-configuration that allows all files ending in .cgi to be defined as CGI.

The designated directory method is something of a relic of the past (the very first servers used it as the only method for determining which files were CGI programs), but it has several advantages.

It keeps CGI programs centralized, preventing other directories from becoming cluttered.
You're not limited to any particular filename extension, so you can name your files whatever you want. Some servers allow you to designate several different directories as CGI directories.
It also gives you more control over who can record CGI. For example, if you have a server and support a system with multiple users and don't want them to use their own CGI scripts without first reviewing the program for security reasons, you can designate only those files in a limited, centralized directory as CGI. Users will then have to provide you with CGI programs to install, and you can first audit the code to make sure the program doesn't have any major security issues.

The CGI notation via filename extension can be useful due to its flexibility. You are not limited to one single directory for CGI programs. Most servers can be configured to recognize CGI via the filename extension, although not all are configured this way by default.

Warning

Remember the importance of security issues when you configure your server for CGI. Some tips will be covered here, and Chapter 9, Protecting CGI, covers these aspects in more detail.

Installing CGI on UNIX servers

Regardless of how your UNIX server is configured, there are several steps you need to take to ensure that your CGI applications run as expected. Your Web server will typically run as a non-existent user (that is, the UNIX user nobody - an account that has no file permissions and cannot be logged in). CGI scripts (written in Perl, the Bourne shell, or another scripting language) must be executable and readable throughout the world.

Clue

To make your files readable and executable worldwide, use next command UNIX permissions: chmod 755 filename.

If you are using a scripting language such as Perl or Tcl, provide the full path of your interpreter on the first line of your script. For example, a Perl script using perl in the /usr/local/bin directory would begin with the following line:

#!/usr/local/bin/perl

Warning

Never place the interpreter (perl, or Tcl Wish binary) in the /cgi-bin directory. This creates a security risk on your system. This is discussed in more detail in Chapter 9.

Some generic UNIX servers

The NCSA and Apache servers have similar configuration files because the Apache server was originally based on the NCSA code. By default, they are configured so that any file in the cgi-bin directory (located by default in ServerRoot) is a CGI program. To change the location of the cgi-bin directory, you can edit the conf/srm.conf configuration file. The format for configuring this directory is

ScriptAlias fakedirectoryname realdirectoryname

where fakedirectoryname is the pseudo directory name (/cgi-bin) and realdirectoryname is the full path where the CGI programs are actually stored. You can configure more than one ScriptAlias by adding more ScriptAlias lines.

The default configuration is sufficient for most users' needs. You need to edit the line in the srm.conf file in either case to determine the correct realdirectoryname. If, for example, your CGI programs are located in /usr/local/etc/httpd/cgi-bin, the ScriptAlias line in your srm.conf file should be something like this:

ScriptAlias /cgi-bin/ /usr/local/etc/httpd/cgi-bin/

To access or link to CGI programs located in this directory, use the following URL:

Http://hostname/cgi-bin/programname

Where hostname is the name of the host of your Web server, and programname is the name of your CGI.

For example, let's say you copy the hello.cgi program to your cgi-bin directory (eg /usr/local/etc/httpd/cgi-bin) on your Web server called www.company.com. To access your CGI, use the following URL: http://www.company.com/cgi-bin/hello.cgi

If you want to configure your NCSA or Apache server to recognize any file with a .cgi extension as a CGI, you need to edit two configuration files. First, in the srm.conf file, uncomment the following line:

AddType application/x-httpd-cgi .cgi

This will associate the MIME type CGI with the .cgi extension. Now, we need to change the access.conf file so that we can run CGI in any directory. To do this, add the ExecCGI option to the Option line. It will look something like the following line:

Option Indexes FollowSymLinks ExecCGI

Now, any file with a .cgi extension is considered CGI; access it as you would any file on your server.

The CERN server is configured in the same way as the Apache and NCSA servers. Instead of ScriptAlias, the CERN server uses the Exec command. For example, in the httpd.conf file, you will see the following line:

Exec /cgi-bin/* /usr/local/etc/httpd/cgi-bin/*

Other UNIX servers can be configured in the same way; This is described in more detail in the server documentation.

Installing CGI on Windows

Most servers available for Windows 3.1, Windows 95 and Windows NT are configured using the "file name extension" method for CGI recognition. In general, changing the server configuration to Windows based simply requires running the server configuration program and making the appropriate changes.

Sometimes configuring a server to run a script (such as Perl) correctly can be difficult. In DOS or Windows, you will not be able to specify the interpreter on the first line of the script, as is the case with UNIX. Some servers have a predefined configuration to associate certain filename extensions with the interpreter. For example, many Windows Web servers assume that files ending in .pl are Perl scripts.

If the server does not perform this type of file association, you can define a packager batch file that calls both the interpreter and the script. As with the UNIX server, do not install the interpreter in either the cgi-bin directory or any Web-accessible directory.

Installing CGI on Macintosh

The two most well-known server options for the Macintosh are WebStar StarNine and its predecessor MacHTTP. Both recognize CGI by its filename extension.

MacHTTP understands two different extensions: .cgi and .acgi, which stands for asynchronous CGI. Regular CGI programs installed on a Macintosh (with a .cgi extension) will keep the Web server in a busy state until the CGI finishes running, causing the server to suspend all other requests. Asynchronous CGI, on the other hand, allows the server to accept requests even while it is running.

A CGI Macintosh developer using any of these Web servers should, if possible, use just the .acgi extension rather than the .cgi extension. It should work with most CGI programs; if it doesn't work, rename the program to .cgi.

Executing CGI

Once you have installed CGI, there are several ways to execute it. If your CGI program is an output-only program, such as the Hello,World! program, then you can execute it simply by accessing its URL.

Most programs run as a server application on an HTML form. Before learning how to get information from these forms, first read a short introduction about creating such forms.

A quick tutorial on HTML forms

The two most important tags in an HTML form are the

And . You can create most HTML forms using just these two tags. In this chapter, you will explore these tags and a small subset possible types or attributes . A complete guide and link to HTML forms is in Chapter 3, HTML and Forms.

Tag

Tag used to determine which part of the HTML file should be used for user-entered information. This refers to how most HTML pages call a CGI program. Tag attributes specify the program's name and location - either locally or as a full URL, the type of encoding used, and the data movement method used by the program.

The next line shows the specifications for the tag :

< ACTION FORM = "url" METHOD = ENCTYPE = "..." >

The ENCTYPE attribute does not play a special role and is usually not included with the tag . Detailed information regarding the ENCTYPE tag is given in Chapter 3. One way to use ENCTYPE is shown in Chapter 14, "Branded Extensions."

The ACTION attribute refers to the URL of the CGI program. Once the user fills out the form and provides information, all information is encoded and transferred to the CGI program. The CGI program itself solves the problem of decoding and processing information; This aspect is discussed in "Accepting Input from the Browser," later in this chapter.

Finally, the METHOD attribute describes how the CGI program should receive input. The two methods, GET and POST, differ in how they pass information to the CGI program. Both are discussed in "Accepting Input from the Browser."

In order for the browser to allow user input, all form tags and information must be surrounded by the tag . Don't forget the closing tag

to indicate the end of the form. You cannot have a form within a form, although you can set up a form that allows you to present pieces of information in different places; this aspect is discussed extensively in Chapter 3.

Tag

You can create text input bars, radio buttons, checkboxes, and other means of accepting input using the tag . This section covers only text input fields. To implement this field, use the tag with the following attributes:

< INPUT TYPE=text NAME = "... " VALUE = "... " SIZE = MAXLENGTH = >

NAME is the symbolic name of the variable that contains the value entered by the user. If you include text in the VALUE attribute, that text will be placed as default in the text input field. The SIZE attribute allows you to specify the horizontal length of the input field as it will appear in the browser window. Finally, MAXLENGTH specifies the maximum number of characters the user can enter into the field. Please note that the VALUE, SIZE, MAXLENGTH attributes are optional.

Form Submission

If you have only one text field within a form, the user can submit the form by simply typing information on the keyboard and pressing Enter. Otherwise, there must be some other way for the user to present the information. The user submits information using a submit button with the following tag:

< Input type=submit >

This tag creates a Submit button inside your form. When the user finishes filling out the form, he or she can submit its contents via URL address specified by the ACTION attribute of the form by clicking the Submit button.

Accepting input from the browser

Above were examples of recording a CGI program that sends information from the server to the browser. In reality, a CGI program that only outputs data does not have many applications (some examples are given in Chapter 4). The more important ability of CGI is to receive information from the browser - the feature that gives the Web its interactive character.

The CGI program receives two types of information from the browser.

First, it obtains various pieces of information about the browser (its type, what it can view, the host host, and so on), the server (its name and version, its execution port, and so on), and the CGI program itself ( program name and where it is located). The server gives all this information to the CGI program through environment variables.
Second, the CGI program can receive user input. This information, after being encoded by the browser, is sent either through an environment variable (GET method) or through standard input (stdin - POST method).

Environment Variables

It is useful to know what environment variables are available to a CGI program, both during training and for debugging. Table 2.2 lists some of the available CGI environment variables. You can also write a CGI program that outputs environment variables and their values to a Web browser.

Table 2.2. Some Important CGI Environment Variables Environment Variable Purpose REMOTE_ADDR IP address of the client machine. REMOTE_HOST The host of the client machine. HTTP _ACCEPT Lists the MIME data types that the browser can interpret. HTTP _USER_AGENT Browser information (browser type, version number, operating system, etc.). REQUEST_METHOD GET or POST. CONTENT_LENGTH The size of the input if sent via POST. If there is no input or if the GET method is used, this parameter is undefined. QUERY_STRING Contains the input information when it is passed using the GET method. PATH_INFO Allows the user to specify a path from the CGI command line (for example, http://hostname/cgi-bin/programname/path). PATH_TRANSLATED Translates relative path in PATH_INFO to the actual path on the system.

To write a CGI application that displays environment variables, you need to know how to do two things:

Define all environment variables and their corresponding values.
Print results to the browser.

You already know how to perform the last operation. In Perl, environment variables are stored in the associative array %ENV, which is introduced by the name of the environment variable. Listing 2.3 contains env.cgi, a Perl program that accomplishes our goal.

Listing 2.3. A Perl program, env.cgi, that prints out all the CGI environment variables.

#!/usr/local/bin/perl print "Content-type: text/html\n\n"; print " \n"; print " CGI Environment\n"; print "\n"; print " \n"; print "

CGI Environment

\n"; foreach $env_var (keys %ENV) ( print " $env_var= $ENV($env_var)
\n"; ) print " \n";

A similar program could be written in C; the complete code is in Listing 2.4.

Listing 2.4. Env.cgi.c in C. /* env.cgi.c */ #include extern char **environ; int main() ( char **p = environ; printf("Content-Type: text/html\r\n\r\n"); printf(" \n"); printf(" CGI Environment\n"); printf("\n"); printf(" \n"); printf("

CGI Environment

\n"); while(*p != NULL) printf("%s
\n",*p++); printf(" \n"); )

GET or POST?

What's the difference between the GET and POST methods? GET passes the encoded input string through the QUERY_STRING environment variable, while POST passes it through stdin. POST is the preferred method, especially for forms with a lot of data, because there are no restrictions on the amount of information sent, and when GET method the amount of environmental space is limited. GET does however have a certain useful property; this is covered in detail in Chapter 5, Input.

To determine which method is used, the CGI program checks the environment variable REQUEST_METHOD, which will be set to either GET or POST. If it is set to POST, the length of the encoded information is stored in the CONTENT_LENGTH environment variable.

Coded Input

When a user submits a form, the browser first encodes the information before sending it to the server and then to the CGI application. When you use the tag , each field is given a symbolic name. The value entered by the user is represented as the value of the variable.

To determine this, the browser uses a URL encoding specification, which can be described as follows:

Separates different fields with an ampersand (&).
Separates the name and values with equal signs (=), with the name on the left and the value on the right.
Replaces spaces with plus signs (+).
Replaces all "abnormal" characters with a percent sign (%) followed by a two-digit hex code for the character.

Your final encoded string will be similar to the following:

Name1=value1&name2=value2&name3=value3 ...

Note: Specifications for URL encoding are found in RFC1738.

For example, let's say you had a form that asked for name and age. The HTML code that was used to display this form is shown in Listing 2.5.

Listing 2.5. HTML code to display the name and age form.

Name and Age

Let's say the user enters Joe Schmoe in the name field and 20 in the age field. The input will be encoded in the input string.

Name=Joe+Schmoe&age=20

Parsing input

For this information to be useful, you need to use the information on something that can be used by your CGI programs. Strategies for parsing input are covered in Chapter 5. In practice, you will never have to think about how to parse input, because several experts have already written publicly accessible libraries that produce parsing. Two such libraries are presented in this chapter in the following sections: cgi -lib.pl for Perl (written by Steve Brenner) and cgihtml for C (written by me).

The general purpose of most libraries written in various languages, is to parse the encoded string and put name and value pairs into a data structure. There is an obvious advantage to using a language that has built-in data structures like Perl; however, most libraries for low-level languages such as C and C++ include data structure and subroutine execution.

It is not necessary to achieve a complete understanding of libraries; it's more important to learn how to use them as tools to make the CGI programmer's job easier.

Cgi-lib.pl

Cgi-lib.pl uses Perl associative arrays. The &ReadParse function parses the input string and enters each name/value pair by name. For example, corresponding Perl strings, needed to decode the "name/age" input string just presented would be

&ReadParse(*input);

Now, to see the value entered for "name", you can access the associative array $input("name"). Similarly, to access the value of "age", you need to look at the variable $input ("age").

Cgihtml

C doesn't have any built-in data structures, so cgihtml implements its own linklist for use with its CGI parsing routines. This defines the entrytype structure as follows:

Typedef struct ( Char *name; Char *value; ) Entrytype;

To parse the input string "name/age" in C using cgihtml, the following is used:

/* declare a linked list called input */ Llist input; /* parse input and location in linked list */ read_cgi_input(&input);

To access age information, you can either parse the list manually or use the available cgi _val() function.

#include #include Char *age = malloc(sizeof(char)*strlen(cgi_val(input, "age")) + 1); Strcpy(age, cgi_val(input, "age"));

The "age" value is now stored in the age string.

Note: Instead of using a simple array (like char age ;), I'm dynamically allocating memory space for the string age. Although this makes programming more difficult, it is nevertheless important from a security point of view. This is discussed in more detail in Chapter 9.

A simple CGI program

You are going to write a CGI program called nameage.cgi that handles the name/age form. Data processing (what I usually call "stuff") is minimal. Nameage.cgi simply decodes the input and displays the user's name and age. While there isn't much use for such a tool, it does demonstrate the most critical aspect of CGI programming: input and output.

You use the same form as above, calling up the "name and age" fields. Don't worry about robustness and efficiency just yet; solve the existing problem in the simplest way. The Perl and C solutions are shown in Listings 2.6 and 2.7, respectively.

Listing 2.6. Nameage.cgi in Perl

#!/usr/local/bin/perl # nameage.cgi require "cgi-lib.pl" &ReadParse(*input); print "Content-Type: text/html\r\n\r\n"; print " \n"; print " Name and Age\n"; print "\n"; print " \n"; print "Hello, " . $input("name") . ". You are\n"; print $input("age") . " years old.

\n"; print " \n";

Listing 2.7. nameage.cgi in C

/* nameage.cgi.c */ #include #include "cgi-lib.h" int main() ( llist input; read_cgi_input(&input); printf("Content-Type: text/html\r\n\r\n"); printf(" \n"); printf(" Name and Age\n"); printf("\n"); printf(" \n"); printf("Hello, %s. You are\n",cgi_val(input,"name")); printf("%s years old.

\n",cgi_val(input,"age")); printf(" \n"); )

Please note that these two programs are almost equivalent. They both contain parsing routines that occupy only one line and process the entire input (thanks to the corresponding library routines). The output is essentially a modified version of your main Hello, World! program.

Try to run the program by filling out the form and clicking the Submit button.

General programming strategy

You now know all the basic principles required for CGI programming. Once you understand how CGI receives information and how it sends it back to the browser, the actual quality of your final product depends on your general programming abilities. Namely, when you program CGI (or anything at all, for that matter), keep the following qualities in mind:

Simplicity
Efficiency
Versatility

The first two qualities are quite common: try to make your code as readable and efficient as possible. Versatility applies more to CGI programs than to other applications. When you start developing your own programs CGI, you will learn that there are several basic applications that everyone wants to make. For example, one of the most common and obvious tasks of a CGI program is to process a form and email the results to a specific recipient. You could have multiple separate forms processed, each with a different recipient. Instead of writing a CGI program for each individual form, you can save time by writing a more general CGI program that applies to all forms.

By covering all the basic aspects of CGI, I've provided you with enough information to get started with CGI programming. However, to become an effective CGI developer, you need to have a deeper understanding of how CGI communicates with the server and browser. The remainder of this book covers in detail the issues that were briefly mentioned in this chapter, as well as application development strategy and the advantages and limitations of the protocol.

Summary

This chapter briefly introduced the basics of CGI programming. You create output by formatting your data correctly and printing to stdout. Receiving CGI input is a bit more complex because it must be parsed before it can be used. Fortunately, several libraries already exist that perform parsing.

By now you should be fairly comfortable with programming CGI applications. The remainder of this book goes into more detail about specifications, tips, and programming strategies for more advanced and complex applications.

Chapter #9.

Programming with using CGI

Including a section on CGI in a book on databases may seem as strange as if cookbook a chapter on car repair was included. Of course, in order to go to the grocery store, you need a working car, but is it appropriate to talk about this? A full discussion of CGI and Web programming in general is beyond the scope of this book, but a brief introduction to these topics is enough to expand the capabilities of MySQL and mSQL for presenting data in the realm of the Web.

This chapter is primarily intended for those who are studying databases but would like to gain some knowledge of Web programming. If your last name is Berners-Lee or Andreessen, you're unlikely to find anything here that you don't already know. But even if you're not new to CGI, having a quick reference guide can be very useful when diving into the mysteries of MySQL and mSQL.

What is CGI?

Like most acronyms, Common Gateway Interface (CGI) doesn't really say much. Interface with what? Where is this gateway? What kind of community are we talking about? To answer these questions, let's go back a little and take a look at the WWW as a whole.

Tim Berners-Lee, a physicist who worked at CERN, came up with the idea of the Web in 1990, although the plan dates back to 1988. The idea was to enable particle physics researchers to easily and quickly share multimedia data - text, images and sound - through the Internet. The WWW consisted of three main parts: HTML, URL and HTTP. HTML - A formatting language used to present content on the Web. URL - this is the address used to retrieve HTML (or other) content from the web server. And finally HTTP - it is a language that is understood by the web server and allows clients to request documents from the server.

The ability to send information of all types over the Internet was revolutionary, but another possibility was soon discovered. If you can send any text over the Web, then why can’t you send text created by a program, and not taken from a ready-made file? This opens up a sea of possibilities. A simple example: you can use a program that prints current time, so that the reader sees the correct time every time they view the page. Several smart heads at the National Center for Supercomputing Applications ( National Center Supercomputer Application Development - NCSA), who were creating a web server, saw this opportunity, and CGI soon appeared.

CGI is a set of rules that allow programs on a server to send data to clients through a web server. The CGI specification was accompanied by changes to HTML and HTTP that introduced new characteristic, known as forms.

If CGI allows programs to send data to the client, then forms extend this capability by allowing the client to send data to that CGI program. Now the user can not only see the current time, but also set the clock! CGI shapes have opened the door to true interactivity in the Web world. Common CGI applications include:

Dynamic HTML. Entire websites can be generated by one CGI program.
Search engines that find documents containing user-specified words.
Guest books and message boards where users can add their messages.
Order forms.
Questionnaires.
Retrieving information from a database hosted on the server.

In subsequent chapters we will discuss all of these CGI applications, as well as some others. They all provide a great way to connect CGI to a database, which is what we're interested in in this section.

HTML Forms

Before exploring the specifics of CGI, it's useful to look at the most common way that end users provide an interface to CGI programs: HTML forms. Forms are part HTML language, providing the end user with fields of various types. Data entered into fields can be sent to the web server. Fields can be used to enter text or be buttons that the user can click or check. Here is an example of an HTML page containing a form:

<НТМL><НЕАD><ТITLЕ>My forms page

<р>This is a page with a form.

This form creates a 40 character string where the user can enter their name. Below the input line there is a button, when clicked, the form data is transferred to the server. Listed below are form-related tags supported by HTML 3.2, the most widely used standard today. Tag and attribute names can be entered in any case, but we adhere to the optional convention that start tags are written in upper case and closing tags are written in lower case.

. Between tags

Three attributes are allowed: ACTION specifies the URL or relative path to the CGI program to which the data will be sent; METHOD specifies the HTTP method through which the form will be submitted (this can be GET or POST, but we will almost always use POST); ENCTYPE specifies the data encoding method (this should only be used if you have a clear understanding of what you are doing).

Provides the most flexible way for user input. In fact there are nine different types tag . The type is specified by the TYPE attribute. The previous example uses two tags : one with type SUBMIT and another with default type TEXT. The nine types are as follows:

TEXT

A field for the user to enter one line of text.

PASSWORD

Same as TEXT, but the text you enter is not displayed on the screen.

CHECKBOX

A checkbox that the user can select and clear.

RADIO

A radio button that must be combined with at least one other radio button. The user can select only one of them.

SUBMIT

A button that, when clicked, submits the form to the web server.

RESET

A button that, when clicked, restores the form to its default values.

FILE

Similar to a text window, but requires entering the name of the file that will be sent to the server.

HIDDEN

An invisible field in which data can be stored.

IMAGE

Similar to the SUBMIT button, but you can set a picture for the image on the button.

In addition to the TYPE attribute tags usually have a NAME attribute that associates the data entered in the field with some name. The name and data are sent to the server in the value=value style. In the previous example, the text field was named firstname . You can use the VALUE attribute to assign predefined values to fields of type TEXT, PASSWORD, FILE, and HIDDEN. The same attribute, used with buttons like SUBMIT or RESET, displays the specified text on them. Fields of type RADIO and CHECKBOX can be shown as checked using the CHECKED attribute without specifying a value.

The SIZE attribute is used to set the length of TEXT, PASSWORD and FILE fields. The MAXLENGTH attribute can be used to limit the length of entered text. The SRC attribute specifies Image URL, used in the IMAGE type. Finally, the ALIGN attribute specifies the alignment of the image for the IMAGE type and can be TOP, MIDDLE, BOTTOM (default), LEFT, or RIGHT (up, middle, down, left, right).

Same as tag , at the tag This last form-related tag allows users to enter pieces of text that will be submitted to the web server. Tag <TEXTAREA>displays a window into which the user can enter any number of lines of text. You must use a closing tag, and any text between the tags will be accepted as the default text And, similar to the VALUE attribute for a tag . For tag three attributes must be specified. The MAME attribute specifies the name of the data - the same as for other form tags. The ROWS and COLS attributes specify the number of rows and columns when the field is displayed on the screen, but do not limit the size of the data entered by the user. Example 9-1 shows the use of all form elements. Example 9-1. HTML form demonstrating the use of various elements <HTML><HEAD><TITLE>My second form page</TITLE> <р>This is a questionnaire. Please provide the following information about yourself: <!-Начнем форму. Мы используем метод "POST" для передачи данных CGI program named "survey.cgi" <FORM METHOD=POST ACTION="survey.cgi"> <р>Name: <INPUT SIZE=40 NAME="name"Xbr> <!-Это тег <INPUT>, which has (by default) type "TEXT". It has a length of 40 Characters, and the data will be named "name" Social Security number: <INPUT TYPE=PASSWORD NAME="ssn" SIZE=20Xbr> <!-Это тег <INPUT>, having the "PASSWORD" style, used so that it is impossible to peek from behind the user's back what value he entered. The data will be named "ssn", the screen field is 20 characters long.--> Are you currently associated with the Communist Party or were you previously associated with it? <INPUT TYPE=CHECKBOX NAME="commie" VALUE="yes"> <!-Это тег <INPUT>, which is of type "CHECKBOX" and uses the data name "commie". When submitting a form with a checkbox selected, the value "yes" will be associated with the name "commie" Floor: <INPUT TYPE=RADIO NAME="sex" VALUE="male">Male <INPUT TYPE=RADIO NAME="sex" VALUE="female">Female <INPUT TYPE=RADIO NAME="sex" VALUE="missing 1 CHECKED> Missing <!-Три тега <INPUT>type"RADIO", использующие для данных имя "sex". Можно выбрать только один вариант из трех, и поскольку один из них предустановлен, значение будет послано, даже если пользователь не выберет ни одного из них, Посылаемое серверу значение находится в атрибуте "VALUE" и мйжет не иметь отношения к тексту, следующему за тегом. --> !} <INPUT TYPE=HIDDEN NAME="form_number" VALUE="33a"> <!-Это дополнительные данные, которые мы хотим послать серверу, но пользователю знать об этом не нужно, поэтому мы поместили их внутрь тега <INPUT>type "HIDDEN" --> Please indicate the path to your favorite game: <INPUT TYPE=FILE NAME="game" SIZE=40> <!-Если пользователь введет правильный путь, то при подаче формы файл будет передан на веб-сервер Q именем "game". Это, однако, не столь опасно, как может показаться, поскольку большинство броузеров запрашивает подтверждение на передачу. --> What is your favorite color(s)? <SELECT NAME="color" MULTIPLE SIZE=5> <OPTION>Red <OPTION>Green <OPTION>Yellow <OPTION>Orange <OPTION VALUE="Blue">The lovely color of the azure sky </select> <!-Это пара тегов <SELECT></select> with several options to choose from <OPTION>. The data will be named "color", you can select several items at once, and all 5 will be displayed on the screen at the same time. The last item uses the "VALUE" attribute to convey short text. --> Describe in detail the socio-political background of the novel War and Peace no more than 50 words. <TEXTAREA NAME="essay" COLS=70 ROWS=10>

, which provides space to enter an essay. The data is named "essay". A block of text is 70 characters wide and 10 lines deep. Space between tags

And

can be used for a sample essay. -->

types "SUBMIT" and "RESET" respectively. The "SUBMIT" button has an overridden "Enter data" label, and the "RESET" button has a default label (defined by the browser). By clicking on the "SUBMIT" button, you will send the data to the web server. The "RESET" button will restore the data to its original state, deleting all user-entered data. -->

The only input type we haven't used here is the IMAGE type for the tag . It could be used as an alternative form submission method. However, the IMAGE type is rarely compatible with text-based and less-responsive browsers, so it's wise to avoid it unless your site has a graphic-heavy style.

Once you've learned the basics of HTML forms, you can start learning about CGI itself.

CGI Specification

So what exactly is the “set of rules” that allows a CGI program in, say, Batavia, Illinois, to communicate with a web browser in Outer Mongolia? The official CGI specification, along with a wealth of other information about CGI, can be found on the NCSA server at http://hoohoo . ncsa.uluc.edu/cgi/. However, this chapter exists for this reason, so that you do not have to travel for a long time and look for it yourself.

There are four ways in which CGI passes data between the CGI-npor frame and the Web server, and therefore the Web client:

Environment variables.
Command line.
Standard input device.
Standard output device.

With these four methods, the server forwards all the data sent by the client to the CGI program. The CGI program then does its magic and sends the output back to the server, which forwards it to the client.

This data is based on the Apache HTTP server. Apache is the most common web server, running on almost any platform, including Windows 9x and Windows NT. However, they may apply to all HTTP servers that support CGI. Some proprietary servers, such as those from Microsoft and Netscape, may have additional features or operate slightly differently. As the face of the Web continues to change at an incredible rate, standards are still evolving and there will undoubtedly be changes in the future. However, when it comes to CGI, the technology appears to be established - at the cost of being replaced by other technologies, such as applets. Any CGI programs you write using this information will almost certainly be able to run for many years on most web servers.

When a CGI program is called through a form, the most common interface, the browser sends the server a long string that begins with the path to the CGI program and its name. This is followed by various other data called path information, which is passed to the CGI program through the PATH_INFO environment variable (Figure 9-1). The path information is followed by a "?" character, followed by the form data, which is sent to the server using the HTTP GET method. This data is made available to the CGI program through the QUERY_STRING environment variable. Any data that the page sends using the HTTP POST method, which is the most commonly used method, will be passed to the CGI program through the standard input device. A typical string that a server might receive from a browser is shown in Fig. 9-1. Program named formread in the catalog cgi-bin called by the server with additional path information extra/information and choice=help request data - presumably as part of the original URL. Finally, the form data itself (the text “CGI programming” in the “keywords” field) is sent via the HTTP POST method.

Environment Variables

When the server runs a CGI program, it first passes it some data to run in the form of environment variables. The specification officially defines seventeen variables, but many more are used informally through the mechanism described below, called HTTP_/nec/zams/n. CGI program

has access to these variables in the same way as any shell environment variables when run from the command line. In a shell script, for example, the F00 environment variable can be accessed as $F00; in Perl this call looks like $ENV("F00"); in C - getenv("F00"); etc. Table 9-1 lists the variables that are always set by the server - even if they are null. In addition to these variables, the data returned by the client in the request header is assigned to variables of the form HTTP_F00, where F00 is the name of the header. For example, most web browsers include version information in a header called USEfl_AGENT. Your CGI-npor-ramma can obtain this data from the HTTP_USER_AGENT variable.

Table 9-1.CGI Environment Variables


	Environment variable	Description
	CONTENT_LENGTH	Length of data transferred using POST or PUT methods, in bytes.
	CONTENT_TYPE	The MIME type of the data attached using the POST or PUT methods.
	GATEWAY_INTERFACE	The version number of the CGI specification supported by the server.
	PATH_INFO	Additional path information sent by the client. For example, for the request http://www.myserver.eom/test.cgi/this/is/a/ path?field=green the value of the variable PATH_ INFO will be /this/is/a/path.
	PATH_TRANSLATED	Same as PATH_INFO, but the server produces all
		Possible translation, for example, name extensions like “-account”. »
	QUERY_STRING	All data following the "?" in URL. This is also the data passed when the form's REQ-UEST_METHOD is GET.
	REMOTE_ADDR	IP address of the client making the request.
	REMOTE_HOST	The host name of the client machine, if available.
	REMOTE_IDENT	If the web server and client support type identification identd then this is the username of the account that is making the request.
	REQUEST_METHOD	The method the client uses to make the request. For the CGI programs we are going to create, this will usually be POST or GET.
	SERVER_NAME	The hostname—or IP address if no name is available—of the machine running the web server.
	SERVER_PORT	The port number used by the web server.
	SERVER_PROTOCOL	The protocol used by the client to communicate with the server. In our case, this protocol is almost always HTTP.
	SERVER_SOFTWARE	Information about the version of the web server running the CGI program.
	SCRIPT_NAME	The path to the script to run, as specified by the client. Can be used when a URL refers to itself, and so that scripts referenced in different locations can be executed differently depending on the location.

Here's an example of a CGI Perl script that prints out all the environment variables set by the server, as well as any inherited variables, such as PATH, set by the shell that started the server.

#!/usr/bin/perl -w

print<< HTML;

Content-type: text/html\n\n

<р>Environment Variables

HTML

foreach (keys %ENV) ( print "$_: $ENV($_)
\n"; )

print<

HTML

All of these variables can be used and even modified by your CGI program. However, these changes do not affect the web server that runs the program.

Command line

CGI allows arguments to be passed to the CGI program as command line parameters, which is rarely used. It is rarely used because its practical applications are few, and we will not dwell on it in detail. The bottom line is that if the QUERY_STRING environment variable does not contain the "=" character, then the CGI program will be executed with the command line parameters taken from QUERY_STRING. For example, http://www.myserver.com/cgi- bin/finger?root will run finger root on www.myserver.com.

There are two main libraries that provide a CGI interface to Perl. The first one is cgi-lib.pl Utility cgi-lib.pl very common because for a long time it was the only large library available. It is designed to work in Perl 4, but works with Perl 5. The second library, CGI.pm, newer and in many ways superior cgi-lib.pl. CGI.pm written for Perl 5 and uses a fully object-oriented design for working with CGI data. Module CGI.pm parses the standard input device and the QUERY_STRING variable and stores the data in a CGI object. Your program only needs to create a new CGI object and use simple methods like paramQ to retrieve the data you need. Example 9-2 serves as a short demonstration of how CGI.pm interprets the data. All Perl examples in this chapter will use CGI.pm.

Example 9-2. Parsing CGI Data in Perl

#!/usr/bin/perl -w

use CGI qw(:standard);

# The CGI.pm module is used. qw(:standard) imports

# namespace of standard CGI functions to get

# clearer code. This can be done if in the script

# only one CGI object is used.

$mycgi = new CGI; #Create a CGI object that will be the gateway to the form data

@fields = $mycgi->param; # Retrieve the names of all completed form fields

print header, start_html("CGI.pm test"); ft Methods "header" and "start_html",

# provided

# CGI.pm, make it easier to get HTML.

# "header" outputs the required HTTP header, a

#"start_html" outputs an HTML header with the given name,

#a is also a tag .

print "<р>Form data:
";

foreach (@fields) ( print $_, ":",- $mycgi->param($_), "
"; }

# For each field, print the name and value obtained using

# $mycgi->param("fieldname").

print end_html; # Shorthand for displaying ending tags "".

Processing input data in C

Since the core APIs for MySQL and mSQL are written in C, we won't completely abandon C in favor of Perl, but we will provide some C examples where appropriate. There are three widely used C libraries for CGI programming: cgic Tom Boutell*; cgihtml Eugene Kim and libcgi from EIT*. We believe that cgic is the most complete and easy to use. What it lacks, however, is the ability to list all the form variables when you don't know them in advance. In fact, it can be added with a simple patch, but that is beyond the scope of this chapter. Therefore, in Example 9-3 we use the library cgihtml, to repeat the above Perl script in C.

Example 9-3.Parsing CGI data in C

/* cgihtmltest.c - Typical CGI program for displaying keys and their values

from data received from the form */

#include

#include "cgi-lib.h" /* This contains all CGI function definitions */

#include "html-lib.h" /* This contains "all HTML helper function definitions */

void print_all(llist 1)

/* These functions output the data submitted by the form in the same format as the Perl script above. Cgihtml also provides a built-in function

Print_entries(), which does the same thing using HTML list format. */ (

node*window;

/* The "node" type is defined in the cgihtml library and refers to a linked list that stores all the form data. */

window = I.head; /* Sets a pointer to the beginning of the form data */

while (window != NULL) ( /* Loop through the linked list to the last (first empty) element */

printf(" %s:%s
\n",window->entry. name,replace_ltgt(window->entry.value));

/* Print data. Replace__ltgt() is a function that understands the HTML encoding of text and ensures that it is correctly output to the client browser. */

window = window->next; /* Move to the next list element. */

} }

int main() (

llist entries; /* Pointer to parsed data*/

int status; /* Integer representing status */

Html__header(); /* HTML helper function that outputs the HTML header*/

Html_begin("cgihtml test");

/* An HTML helper function that prints the beginning of an HTML page with the specified title. */

status = read_cgi_input(&entries); /* Enters and parses form data*/

Printf("<р>Form data:
");

Print_all(entries); /* Calls the print_all() function defined above. */

html_end(); /* HTML helper function that prints the end of the HTML page. */

List_clear(&entries); /* Frees memory occupied by form data. */

return 0; )

Standard Output Device

The data sent by the CGI program to the standard output device is read by the web server and sent to the client. If the script name starts with nph-, then the data is sent directly to the client without intervention from the web server. In this case, the CGI program must generate the correct HTTP header that the client will understand. Otherwise, let the web server generate the HTTP header for you.

Even if you don't use nph-scenario, the server needs to be given one directive that will tell it information about your output. This is usually the Content-Type HTTP header, but can also be the Location header. The header must be followed by an empty line, that is, a line feed or a CR/LF combination.

The Content-Type header tells the server what type of data your CGI program is producing. If this is an HTML page, then the string should be Content-Type: text/html. The Location header tells the server a different URL - or a different path on the same server - where to direct the client. The header should look like this: Location: http:// www. myserver. com/another/place/.

After the HTTP headers and an empty line, you can send the actual data produced by your program - an HTML page, an image, text, or anything else. Among the CGI programs supplied with the Apache server are nph-test-cgi And test-cgi which nicely demonstrate the difference between nph and non-nph style headings, respectively.

In this section we will use libraries CGI.pm And cgic, which have functions to output both HTTP and HTML headers. This will allow you to focus on outputting the actual content. These helper functions are used in the examples given earlier in this chapter.

Important Features of CGI Scripts

You already know basically how CGI works. The client sends data, usually using a form, to the web server. The server executes the CGI program, passing data to it. The CGI program does its processing and returns its output to the server, which passes it on to the client. Now from understanding how CGI npor frames work, we need to move on to understanding why they are so widely used.

Although you already know enough from this chapter to be able to put together a simple working CGI program, there are a few more important issues to cover before you can create actually working programs for MySQL or mSQL. First, you need to learn how to work with multiple forms. Next, you need to learn some security measures that will prevent attackers from illegally accessing or destroying your server files.

Storing state

State remembering is a vital means of providing a good service to your users, and not just for fighting hardened criminals as it may seem. The problem is caused by the fact that HTTP is a so-called “memoryless” protocol. This means that the client sends data to the server, the server returns the data to the client, and then everyone goes their own way. The server does not store data about the client that may be needed in subsequent operations. Likewise, there is no guarantee that the client will retain any data about the transaction that can be used later. This places an immediate and significant limitation on the use of the World Wide Web.

Scripting CGI with this protocol is similar to not being able to remember a conversation. Whenever you talk to someone, no matter how often you've talked to them before, you have to introduce yourself and look for a common topic of conversation. There is no need to explain that this is not conducive to productivity. Figure 9-2 shows that whenever a request reaches a CGI program, it is an entirely new instance of the program, with no connection to the previous one.

On the client side, with the advent of Netscape Navigator, a seemingly hastily made solution called cookies appeared. It consists of creating a new HTTP header that can be sent back and forth between the client and server, similar to the Content-Type and Location headers. The client browser, upon receiving the cookie header, must store the data in the cookie, as well as the name of the domain in which the cookie operates. Then, whenever a URL within the specified domain is visited, a cookie header must be returned to the server for use by CGI programs on that server.

The cookie method is mainly used to store the user ID. Information about the visitor can be saved in a file on the server machine. This user's unique ID can be sent as a cookie to the user's browser, and then each time the user visits the site, the browser automatically sends this ID to the server. The server passes the ID to the CGI program, which opens the corresponding file and gains access to all data about the user. All this happens unnoticed by the user.

Despite the usefulness of this method, most large sites do not use it as their only means of remembering state. There are a number of reasons for this. First, not all browsers support cookies. Until recently, the main browser for people with limited vision (not to mention people with insufficient Internet connection speeds) - Lynx - did not support cookies. It still doesn't "officially" support them, although some of its widely available "side branches" do. Secondly, and more importantly, cookies tie the user to a specific machine. One of the great benefits of the Web is that it is accessible from anywhere in the world. No matter where your web page was created or stored, it can be displayed from any Internet-connected machine. However, if you try to access a cookie-enabled site from someone else's machine, all of your personal information maintained by the cookie will be lost.

Many sites still use cookies to personalize user pages, but most complement them with a traditional login/password style interface. If the site is accessed from a browser that does not support cookies, the page contains a form in which the user enters the login name and password assigned to him when he first visited the site. Typically this form is small and unassuming, so as not to scare off the majority of users who are not interested in any personalization, but simply want to move on. After the user enters a login name and password into the form, CGI finds a file containing data about that user, as if the name was sent with a cookie. Using this method, the user can register on a personalized website from anywhere in the world.

In addition to the tasks of taking into account user preferences and long-term storage of information about him, we can give a more subtle example of remembering state, which is provided by popular search engines. When you search using services such as AltaVista or Yahoo, you will usually get many more results than can be displayed in an easy-to-read format. This problem is solved by showing a small number of results - usually 10 or 20 - and giving some navigation facility to view the next group of results. While this behavior seems normal and expected to the average Web surfer, actually implementing it is nontrivial and requires state storage.

When a user first makes a query to a search engine, it collects all the results, perhaps limited to some predefined limit. The trick is to produce these results in small quantities at a time, while remembering what kind of user requested these results and what portion he is expecting next. Leaving aside the complexities of the search engine itself, we are faced with the problem of consistently providing the user with some information on one page. Consider Example 9-4, which shows a CGI script that prints ten lines of a file and gives it the option to look at the next or previous ten lines.

Example 9-4. Saving State in a CGI Script

#!/usr/bin/perl -w

use CGI;

Open(F,"/usr/dict/words") or die("He can't open! $!");

#This is the file that will be output, it can be anything.

$output = new CGI;

sub print_range ( # This main function programs, my $start = shift;

# Starting line of the file, my $count = 0;

# Pointer, my $line = "";

# Current line of file, print $output->header,

$output->start_html("My dictionary");

# Produces HTML with the title "My Dictionary", print " \n";

while (($count< $start) and ($line = )) ( $count++; )

# Skip all lines before the initial one, while (($count< $start+10) and ($line ? )) ( print $line; $count++; )

# Print the next 10 lines.

my $newnext = $start+10; my $newprev = $start-10;

# Set initial lines for URLs "Next" and "Previous"

print "

unless ($start == 0) ( # Include "Previous" URL unless you

# is no longer at the beginning.

print qq%Previous%; )

unless (eof) ( # Include the "Next" URL unless you # not at the end of the file.

print qq% Next%;

}

print "HTML;HTML

exit(0); )

# If there is no data, start over,

if (not $output->param) (

&print_range(0); )

# Otherwise, start from the line specified in the data.

&print_range($output->param("start"));

In this example, the state is stored using the simplest method. There is no problem with saving data, since we keep it in a file on the server. We only need to know where to start output, so the script simply includes in the URL the starting point for the next or previous group of lines - all that is needed to generate the next page.

However, if you need more than the ability to simply flip through a file, then relying on a URL can be cumbersome. You can alleviate this difficulty by using an HTML form and including state data in tags type HIDDEN. This method has been used successfully on many sites, allowing links to be made between related CGI programs or expanding the use of a single CGI program, as in the previous example. Instead of pointing to a specific object, such as a home page, the URL data may point to an automatically generated user ID.

This is how AltaVista and other search engines work. The first search generates a user ID, which is hidden behind the scenes in subsequent URLs. Associated with this ID is one or more files containing the results of the query. The URL includes two more values: your current position in the results file and the direction in which you want to navigate next in it. These three values are all that is needed for the powerful navigation systems of large search engines to work.

However, there is still something missing. The file used in our example /usr/diet/words very big. What if we give up halfway through reading it, but want to come back to it later? If you don't remember the URL of the next page, there is no way to go back, not even AltaVista will allow it. If you restart your computer or use another computer, you won't be able to return to your previous search results without re-entering your search. However, this long-term state storage is at the heart of the website personalization we discussed above, and it's worth looking at how it can be used. Example 9-5 is a modified version of Example 9-4.

Example 9-5. Stable state memorization

#!/usr/bin/perl -w

use CGI;

umask 0;

Open(F,"/usr/dict/words") or die("He can't open! $!");

Chdir("users") or die("I can't go to directory $!");

# This is the directory where all data will be stored

# about the user.

Soutput = new CGI;

if (not$output->param) (

print $output->header,

$output->start_html("My dictionary");

print "HTML;

<р>Enter your username:

HTML

exit(0); )

$user = $output->param("username");

## If there is no user file, create it and install it

## initial value is "0",

if (not -e "$user") (

open (U, ">$user") or die("I can't open! $!");

print U "0\n";

close U;

&print_range("0");

## if the user exists and is not specified in the URL

## start value, read last value and start from there.

) elsif (not $output->param("start")) (

Open(U,"Suser") or die("I can't open the user! $!");

$start = ; close U;

chomp $starl;

uprint range($start);

## If the user exists and is not specified in the URL

## initial value, write down initial value

## to the user file and start output.

)else(

Open(U,">$user") or die("I can't open the user for writing! $!");

print U $output->param("start"), "\n";

close U;

&print_range($output->param("start 1)); )

sub print_range(

my $start = shift;

my $count = 0;

my $line = " "

print $output->header,

$output->start_html("My dictionary");

print "

\n"; 

 while (($count< $start) and ($line = )) ( $count++; ) 

 while (($count< $start+10) and ($line = ))

 print $line;  $count++;
 my $newnext = $start+10;
 my $newprev = $start-10;
 print "

unless (Sstart == 0)

{

qq%

Previous%;

}

unless (eof) ( print qq% Next%;

# Note that the username "username" is appended to the URL.

# Otherwise the CGI will forget which user it was dealing with.

}

print $output->end_html;

exit(0") ;

}

Security measures

When operating Internet servers, whether they HTTP servers or otherwise, maintaining safety measures is a critical concern. Data exchange between client and server carried out within

CGI raises a number of important issues related to data protection. The CGI protocol itself is quite secure. A CGI program receives data from the server via standard input or environment variables, both of which are safe methods. But once a CGI program has control of the data, there are no restrictions on what it can do. A poorly written CGI program could allow an attacker to gain access to the server system. Consider the following example of a CGI program:

#!/usr/bin/perl -w

use CGI;

my $output = new CGI;

my $username = $output»param("username");

print $output->header, $output->start_html("Finger Output"),

", "finger $username", "

", $output->end_html;

This program provides a valid CGI interface to the command finger. If you run the program simply like finger.cgi, it will list all current users on the server. If you run it like finger.cgi?username=fred, then it will display information about the user “fred” on the server. You can even run it like finger. cgi?userna-me=bob@f oo.com to display information about the remote user. However, if you run it like finger.cgi?username=fred; [email protected] Undesirable things may happen. The backstroke operator """" in Perl spawns a shell process and executes a command that returns a result. In this program " finger $username* used as an easy way to execute the finger command and get its output. However, most command processors allow you to combine multiple commands on one line. For example, any processor like the Bourne processor does this by using the symbol "; " That's why"finger fred;mail [email protected] will run the command first finger, and then the command mail [email protected] which can send the entire server password file to an unwanted user.

One solution is to parse the form data to look for malicious content. You can, say, look for the symbol “;” and delete all characters following it. It is possible to make such an attack impossible by using alternative methods. The above CGI program can be rewritten as follows:

#!/usr/local/bin/perl -w

use CGI;

my $output = new CGI;

my $username = $output->param("username");

$|++;

# Disable buffering in order to send all data to the client,

print $output->header, $putput->start_html("Finger Output"), "
\n";

$pid = open(C_OUT, "-|");# This Perl idiom spawns a child process and opens

# channel between parent and child processes,

if ($pid) (# This is the parent process.

print ; ft Print the output of the child process.

print "
", $output->end_html;

exit(O); ft End the program. )

elsif (defined $pid) ( # This is a child process.

$|++; # Disable buffering.

exec("/usr/bin/finger",$username) or die("exec() call failed.");

# Executes the finger program with Susername as the only one
# command line argument. ) else ( die("fork() failed"); )

# Error checking.

As you can see, this is not a much more complex program. But if you run it like finger.cgi?username=fred; [email protected] then the finger program will be executed with the argument fred;mail [email protected] as one username.

As an additional security measure, this script runs finger explicitly as /usr/bin/finger. In the unlikely event that the web server passes your CGI program an unusual PATH, running just finger may cause the wrong program to execute. Another security measure you can take is to examine the PATH environment variable and make sure it has an acceptable value. It's a good idea to remove the current working directory from PATH unless you are sure that it is not the case where you actually need to execute the program in it.

Another important security consideration relates to user rights. By default, the web server runs the CGI program with the rights of the user who started the server itself. This is usually a pseudo-user such as "nobody" that has limited rights, so the CGI program has few rights either. This is usually a good thing, because if an attacker can gain access to the server through a CGI program, he won't be able to do much damage. The example of a password stealing program shows what can be done, but the actual damage to the system is usually limited.

However, running as a limited user also limits the capabilities of CGI. If a CGI program needs to read or write files, it can only do so where it has such permission. For example, in the second example of storing state, a file is maintained for each user. The CGI program must have read and write permission on the directory containing these files, not to mention the files themselves. This can be done by creating the directory as the same user as the server, with read and write permissions only for that user. However, for a user like "nobody", only root has this ability. If you are not a superuser, you will have to communicate with the system administrator every time you make a change to the CGI.

Another way is to make the directory free to read and write, essentially removing all protection from it. Since these files can only be accessed from the outside world through your program, the danger is not as great as it might seem. However, if a flaw is discovered in the program, the remote user will have full access to all files, including the ability to destroy them. In addition, legitimate users running on the server will also be able to modify these files. If you are going to use this method, then all users of the server must be trustworthy. Also, use the open directory only for files that are needed by the CGI program; in other words, don't put unnecessary files at risk.

If this is your first time getting into CGI programming, there are a number of ways you can continue to explore. Dozens of books have been written on this subject, many of which do not assume any knowledge of programming. "CGI Programming on the World Wide Web" published by O"Reilly and Associates, covers material from simple scripts in different languages to truly amazing tricks and tricks. Public information is also available in abundance on the WWW. A good place to start is CGI Made Really Easy(Really just about CGI) at http://www.jmarshall.com/easy/cgi/ .

CGI and databases

Since the dawn of the Internet era, databases have interacted with the development of the World Wide Web. In practice, many people view the Web as simply one giant database of multimedia information.

Search engines provide an everyday example of the benefits of databases. The search engine doesn't go roaming the entire Internet looking for keywords the moment you ask for them. Instead, site developers use other programs to create a giant index that serves as a database from which the search engine retrieves entries. Databases store information in a form that allows rapid, random access retrieval.

Because they're fluid, databases give the Web even more power: they make it a potential interface for anything. For example, system administration can be done remotely via a web interface instead of requiring an administrator to log in to the desired system. Connecting databases to the Web is at the heart of a new level of interactivity on the Internet.

One of the reasons for connecting databases to the Web regularly makes itself known: much of the world's information is already in databases. Databases that predate the Web are called legacy databases (as opposed to non-Web-connected databases created more recently, which should be called a "bad idea"). Many corporations (and even individuals) are now faced with the challenge of providing access to these legacy databases via the Web. Unless your legacy database is MySQL or mSQL, this topic is beyond the scope of this book.

As stated earlier, only your imagination can limit the possibilities of communication between databases and the Web. Currently, there are thousands of unique and useful databases accessible from the Web. The types of databases that operate outside of these applications vary widely. Some of them use CGI programs as an interface to a database server such as MySQL or mSQL. These types are of greatest interest to us. Others use commercial applications to interface with popular desktop databases such as Microsoft Access and Claris FileMaker Pro. Others simply work with flat text files, which are the simplest databases possible.

Using these three types of databases, you can develop useful websites of any size or complexity. One of our goals over the next few chapters will be to apply the power of MySQL mSQL to the Web using CGI programming.

Introduction.

In this article I want to talk about the CGI interface in general, its implementation for Windows and the use of assembly language in particular when writing CGI programs. The scope of this article does not include a full description of CGI, since there is simply a sea of material on this issue on the Internet and I simply don’t see the point in retelling it all here.

TheoryCGI.

CGI – (Common Gateway Interface) – Common Gateway Interface. As you might guess, this interface serves as a gateway between the server (here I mean the server program) and some external program written for the OS on which this very server is running. Thus, CGI is responsible for exactly how data will be transferred from the server program to the CGI program and back. The interface does not impose any restrictions on what the CGI program should be written in; it can be either a regular executable file or any other file - the main thing is that the server can run it (in the Windows environment, for example, it can be a file with the extension tied to any program).

From the moment you called (for example, clicked a form button to which a CGI program call is attached) the CGI program until you receive the result in your browser window, the following happens:

A web client (such as a browser) creates a connection to the server specified in the URL;

The web client sends a request to the server, this request is usually made using two methods GET or POST;

Data from the client request (for example, form field values) is passed by the server, using a CGI interface, to the CGI program specified in the URL;

The CGI program processes the client data received from the server and, based on this processing, generates a response to the client, which it transmits via the same CGI interface to the server, and it, in turn, transmits it directly to the client;

The server closes the connection with the client.

The standard CGI specification assumes that the server can communicate with the program in the following ways:

Environment variables – they can be set by the server when the program starts;

Standard input stream (STDIN) - with its help, the server can transfer data to the program;

Standard output stream (STDOUT) – the program can write its output into it, which is transmitted to the server;

Command line – in it the server can pass some parameters to the program.

Standard input/output streams are very convenient and widely used on UNIX systems, which cannot be said about Windows, so there is a CGI specification developed specifically for Windows systems, called “Windows CGI”. But, of course, standard input/output streams can also be used in windows CGI programming. Here I will not touch on the “Windows CGI” standard, and there are at least two reasons for this - the first, and most important, is that at the moment not all http servers for Windows support this specification (in particular, my favorite Apache 1.3.19) . You can see the second reason by typing “Windows CGI” in any search engine. I will note only general details regarding this interface - all data from the server to the client is transferred through a file that is usual for Windows *.ini, the name of which is passed to the program on the command line. In this case, all the data in the file is already carefully divided into sections by the server and all you have to do is use the “GetPrivateProfile*” functions to extract them from there. The response is sent to the server again through a file whose name is indicated in the corresponding entry in the ini file.

What data can be transferred by the client to the CGI program? - almost any. In general, the program is passed the values of the form fields that the client fills out, but it can also be some kind of binary data, for example, a file with a picture or music. Data can be transferred to the server by two different methods - the GET method and the POST method. When we create a form to be filled out on our page, we explicitly indicate which of the given methods we want to send the data entered by the user, this is done in the main tag of the form something like this:

get action="/cgi-bin/name_script">

When sending data using the GET method, the browser reads the data from the form and places it after the script URL, followed by a question mark; if there are several significant fields in the form, then they are all transmitted through the “&” sign, the field name and its value are written in the URL through the “=” sign " For example, a request generated by a browser from a form when you click on a button to which the script “/cgi-bin/test.exe” is attached, taking into account that the first field of the form is called “your_name”, the second – “your_age”, may look like this:

GET /cgi-bin/test.exe?your_name=Pupkin&your_age=90 HTTP/1.0
Using the GET method has several weaknesses - the first and most important is because The data is transmitted in the URL, then it has a limit on the amount of this transmitted data. The second weakness again follows from the URL - this is confidentiality, with such a transfer the data remains absolutely open. So, it’s good if we have 2-3 small fields in the form... the question arises, what to do if there is more data? The answer is to use the POST method!

When using the POST method, the data is transmitted to the server as a block of data, and not in a URL, which somewhat frees our hands to increase the amount of information transmitted; for the above example of the POST form, the block sent to the server will be something like this:

POST /cgi-bin/test.exe HTTP/1.0
Accept: text/plain
Accept: text/html
Accept: */*
Content-type: application/x-www-form-urlencoded
Content-length: 36
your_name=Pupkin&your_age=90
As mentioned above, after receiving the data, the server must convert it and pass it to the CGI program. In the standard CGI specification, client input for a GET request is placed by the server in the program's environment variable "QUERY_STRING". When a POST request is made, the data is placed on the application's standard input stream, where it can be read by it. In addition, with such a request, the server sets two more environment variables - CONTENT_LENGTH and CONTENT_TYPE, by which one can judge the length of the request in bytes and its content.

In addition to the data itself, the server also sets other environment variables of the called program; here are some of them:
REQUEST_METHOD

Describes exactly how the data was obtained

Example:REQUEST_METHOD=GET

QUERY_STRING

Query string if GET method was used

Example:QUERY_STRING= your_name=Pupkin&your_age=90&hobby=asm

CONTENT_LENGTH

Length in bytes of the request body

Example:CONTENT_LENGTH=31

CONTENT_TYPE

Request body type

GATEWAY_INTERFACE

CGI protocol version

Example:GATEWAY_ INTERFACE= CGI/1.1

REMOTE_ADDR

IP address of the remote host, that is, the client who clicked the button in the form

Example:REMOTE_ADDR=10.21.23.10

REMOTE_HOST

The name of the remote host, this can be its domain name or, for example, the name of a computer in a Windows environment; if these cannot be obtained, then the field contains its IP

Example:REMOTE_HOST=wasm.ru

SCRIPT_NAME

The name of the script used in the request.

Example:SCRIPT_NAME=/cgi-bin/gols.pl

SCRIPT_FILENAME

The name of the script file on the server.

Example:SCRIPT_FILENAME=c:/page/cgi-bin/gols.pl

SERVER _ SOFTWARE

Server software

Example: Apache/1.3.19 (WIN32)

In general, this is all in brief, for more detailed information about the Common Gateway Interface, see the specialized documentation, I made this description in order to remind you, and if you didn’t know, then bring you up to date. Let's try to do something in practice.
Practical part.
For practice, we will need at least 3 things - some kind of http server for Windows, I tried all the examples on Apache 1.3.19 for Windows, the server is free, you can download it from http://httpd.apache.org/download.cgi . Yes, and we will need a server, not just any server, but one configured to run CGI scripts! See the documentation for how to do this for the server you are using. The second thing we need is, of course, an assembler; it is also necessary that the compiler supports the creation of WIN32 console applications, I use Tasm, but Fasm and Masm and many other *asms are perfect. And finally, the most important thing is that this desire is required.
So, I assume that the server was successfully installed and configured by you, so that in the root directory of the server documents there is an index.html file, which is perfectly displayed in the browser when you type the address http://127.0.0.1. I will also take into account that somewhere in the jungle of server folders there is a “cgi-bin” folder in which scripts are allowed to run.
Let's check the server settings, and at the same time write a small script. Our script will be a regular *.bat file. I foresee questions - how? really? Yes, this is an ordinary batch file, as mentioned above, the CGI specification does not distinguish between file types, the main thing is that the server can run it, and it, in turn, has access to stdin/stdout and environment variables, a bat file, even not fully, but for an example it will suit us quite well. Let's create a file with approximately the following content:
@echo off rem Request header echo Content-type: text/html echo. rem Request body echo " Hello!
echo "The GET request sent the following data: %QUERY_STRING%
Let's call the file test.bat and place it in the directory for running scripts, most likely it will be the “cgi-bin” directory. The next thing we need to do is call this script in some way, in principle, this can be done directly by typing something like the following “http://127.0.0.1/cgi-bin/test.bat” into the browser address window, but let’s Let's call it from our main page, and at the same time check the operation of the GET method. Let's create a file index.html in the server root with the following content:

Enter the data to transfer to the server:

Data:

Now, when you enter the server (http://127.0.0.1 in the browser address bar), a form should appear, type something in it and click the “send” button, if everything was done correctly, you will see the response from our bat- in the browser window script. Now let's see what we got up to there.

As you might guess, the “echo” command outputs to stdout; first of all, we pass the header of our response to the server – “echo Content-type: text/html”. This is a standard CGI specification header, indicating that we want to transfer text or an html document; there are other headers. A very important point is the title. must separated from the response body by an empty line, which is what we do with the following “echo.” command. Next, the body of the response itself is transmitted - this is a regular html document. In the body of the document, for clarity, I display one of the environment variables passed to us by the server - “QUERY_STRING”, as already mentioned with the GET method (and this is exactly our case), everything is transmitted in this variable user-entered data, which we can observe in the script response. You may have noticed “the quotation marks are out of place” in the last 2 lines of the file, immediately after “echo”, they are there because of the specificity of bat files, as you can see the html tags are surrounded by the characters “<» и «>", at the same time, these symbols serve as input/output redirection in bat files, and therefore we cannot freely use them here.
I recommend playing around a little with such bat scripts, it can be very useful, try looking at other environment variables. I’ll say a little, deviating from the topic, on UNIX systems, command interpreter languages are very highly developed and the line between programming in a command interpreter language and programming in a “real” programming language is very, very blurry in some cases, so simple scripts are often written on UNIX systems specifically in command interpreter languages, but the Windows interpreter cmd.exe or, earlier, command.com are clearly weak for these purposes.
Now let's move on to the most important task of this article, to actually writing a CGI program in assembler. In principle, if we take into account all of the above about CGI, we can draw a conclusion about what the CGI interface requires from our program:
2. The program must be able to write to the standard output stream (stdout) in order to transmit the result of its work to the server;
3. From the first two points it follows that in order for the server to send something to our program in stdin, and for it to respond to it in stdout, the CGI program must be a console application;
This is quite enough to create a full-fledged CGI application.
Let's start with the last point. To gain access to the environment variables of a Windows application, the API function “GetEnvironmentStrings” is used, the function has no arguments and returns a pointer to an array of environment variables (NAME=VALUE) separated by zero, the array is closed with a double zero, when the program is launched by the server in the program environment in addition to standard variables, specific CGI variables described above are added; when you run the program from the command line, you will not see them, naturally.
In order to write something to stdout or read from stdin, we first need to get the handles of these streams, this is done using the API function “GetStdHandle”, one of the following values is passed as a function parameter:
STD_INPUT_HANDLE - for stdin (standard input);
STD_OUTPUT_HANDLE - for stdout (standard output);
STD_ERROR_HANDLE - for stderr.
The function will return the handle we need for read/write operations. The next thing we need to do is write/read these streams. This is done by normal file read/write operations, i.e. ReadFile and WriteFile. There is one subtlety here, you might think that WriteConsole/ReadConsole can be used for these purposes, but this is indeed true for consoles and it will work great, the results, just like with WriteFile, will be output to the console, but this will continue until we run our program as a script on the server. This happens because when our program is launched by the server, the handles returned by the “GetStdHandle” function will no longer be console handles as such, they will be pipe handles, which is necessary for connecting two applications.
Here is a small example of what a CGI program should look like in assembly language; I think it won’t be too difficult to figure it out:>
386 .model flat,stdcall includelib import32.lib .const PAGE_READWRITE = 4h MEM_COMMIT = 1000h MEM_RESERVE = 2000h STD_INPUT_HANDLE = -10 STD_OUTPUT_HANDLE = -11 .data hStdout dd ? hStdin dd ? hMem dd ? header: db "Content-Type: text/html",13,10,13,10,0 start_html: db " The CGI program environment looks like this:
",13,10,0 for_stdin: db " The STDIN of the program contains:
",13,10,0 end_html: db "",13,10,0 nwritten dd ? toscr db 10 dup (32) db " - File type",0 .code _start: xor ebx,ebx call GetStdHandle,STD_OUTPUT_HANDLE mov hStdout,eax call GetStdHandle,STD_INPUT_HANDLE mov hStdin,eax call write_stdout, offset header call write_stdout, offset start_html call VirtualAlloc,ebx,1000,MEM_COMMIT+MEM_RESERVE,PAGE_READWRITE mov hMem,eax mov edi,eax call GetEnvironmentStringsA mov esi,eax next_symbol: mov al, or al,al jz end_string mov ,al next_string : cmpsb jmp short next_symbol end_string: mov ,">rb<" add edi,3 cmp byte ptr ,0 jnz next_string inc edi stosb call write_stdout, hMem call write_stdout, offset for_stdin call GetFileSize,,ebx mov edi,hMem call ReadFile,,edi, eax,offset nwritten, ebx add edi, mov byte ptr ,0 call write_stdout, hMem call write_stdout, offset end_html call VirtualFree,hMem call ExitProcess,-1 write_stdout proc bufOffs:dword call lstrlen,bufOffs call WriteFile,,bufOffs,eax,offset nwritten,0 ret write_stdout endp extrn GetEnvironmentStringsA:near extrn GetStdHandle:near extrn ReadFile:near extrn WriteFile:near extrn GetFileSize:near extrn VirtualAlloc:near extrn VirtualFree:near extrn ExitProcess:near extrn lstrlen:near ends end _start
The executable file is built with the commands:

tasm32.exe /ml test.asm

tlink32.exe /Tpe /ap /o test.obj

Don't forget that the program must be a console program.
Archive with the program.
You can call this program using the html form described above, you just need to change the name test.bat in the form to test.exe and copy it to /cgi-bin/ accordingly, you can set it to POST in the request method, the program processes it.

I also want to note that you can call the program in another way, you can create a file in the cgi-bin directory, for example test.cgi, with one single line “#!c:/_path_/test.exe” and call it in requests, and the server in in turn, will read its first line and launch the exe file; for this, it is necessary that the *.cgi extension be specified in the http server settings as an extension for scripts. With this approach, the server will launch our program with the command line “test.exe path_to_test.exe” this has several advantages - the first is that the person running our script will not even guess what the script is written in, the second is the way it is transmitted to us the name of the file with our line, we can, for example, add any settings for our script to this file, which simplifies debugging, by the way, this is how all interpreters work - you may have noticed that in all perl/php/etc programs there is a similar line - indicating to the command interpreter itself. So, when the server starts a cgi program, if the program extension is registered as a script in the settings, it reads the first line of the file, and if it turns out to be in the format described above, then it launches the program specified in the line with the name of this file followed by a space, let’s say that in The pearl interpreter is indicated in the line; having received such a gift, it begins its execution, because The comment in Perl is the “#” symbol, then it skips the first line and the script continues to be executed, in general it’s a convenient thing.

That’s basically all I wanted to write about, I don’t know how useful all of this will be to you, but I will say that I have an intranet server running using assembler scripts. I confess, there was no great reason to do this, but nevertheless, I did it at first simply for aesthetic reasons and some reluctance to learn pearl/php or something else. BUT I am in no way dissuading you from learning Pearl, but on the contrary, I will say that it is necessary to do this, and even very necessary, I realized this later, but I still think that on heavily loaded servers, where the speed of execution, loading and the amount of memory occupied by the application plays a role cgi scripts written in assembler will take their rightful place.

Tutorial

Good afternoon.
In this article I would like to talk about the FastCGI protocol and how to work with it. Despite the fact that the protocol itself and its implementation appeared back in 1996, there are simply no detailed manuals for this protocol - the developers never wrote help for their own library. But two years ago, when I just started using this protocol, I often heard phrases like “I don’t quite understand how to use this library.” It is this shortcoming that I want to correct - to write a detailed guide to using this protocol in a multi-threaded program and recommendations for choosing various parameters that anyone could use.

The good news is that the method of encoding data in FastCGI and in CGI is the same, only the method of transmitting it changes: if a CGI program uses the standard input-output interface, then a FastCGI program uses sockets. In other words, you just need to understand a few functions of the library for working with FastCGI, and then just use the experience of writing CGI programs, of which, fortunately, there are a lot of examples.
So, in this article we will look at:
- What is FastCGI and how does it differ from the CGI protocol
- Why do I need FastCGI when there are already many languages for web development
- What implementations of the FastCGI protocol exist?
- What are sockets
- Description of FastCGI library functions
- A simple example of a multi-threaded FastCGI program
- Simple Nginx configuration example
Unfortunately, it is very difficult to write an article that is equally understandable for beginners and interesting for experienced old-timers, so I will try to cover all the points in as much detail as possible, and you can simply skip sections that are not interesting to you.
What is FastCGI?
You can read about FastCGI on Wikipedia. In a nutshell, it's a CGI program running in a loop. If a regular CGI program is restarted for each new request, then a FastCGI program uses a queue of requests that are processed sequentially. Now imagine: your 4-8 core server received 300-500 simultaneous requests. A typical CGI program will run these same 300-500 times. Obviously, there are too many processes - your server is physically unable to process them all at once. This means that you will end up with a queue of processes waiting for their processor time slice. Typically, the scheduler will distribute the processor evenly (so in this case, the priorities of all processes are the same), which means you will have 300-500 “almost ready” responses to requests. It doesn’t sound very optimistic, does it? In the FastCGI program, all these problems are solved by a simple request queue (that is, request multiplexing is used).
Why do I need FastCGI when I already have PHP, Ruby, Python, Perl, etc.?
Perhaps the main reason is that a compiled program will run faster than an interpreted one. For PHP, for example, there is a whole line of accelerators, including APC, eAccelerator, XCache, which reduce the time of code interpretation. But for C/C++ all this is simply not necessary.
The second thing you need to remember is that dynamic typing and the garbage collector take up a lot of resources. Sometimes - a lot. For example, integer arrays in PHP take up about 18 times more memory (up to 35 times depending on various PHP compilation options) than in C/C++ for the same amount of data, so think about the overhead for relatively large data structures.
Third, a FastCGI program can store data common to different requests. For example, if PHP starts processing a request from scratch each time, then the FastCGI program can do a number of preparatory actions even before the first request arrives, for example, allocate memory, load frequently used data, etc. - obviously, all this can improve the overall performance of the system.
Fourth is scalability. If mod_php assumes that the Apache web server and PHP are on the same machine, then the FastCGI application can use TCP sockets. In other words, you can have a whole cluster of several machines, communication with which is carried out over the network. At the same time, FastCGI also supports Unix domain sockets, which allows you to efficiently run a FastCGI application and a web server on the same machine if necessary.
Fifth - safety. Believe it or not, with default settings Apache allows you to do everything under the sun. For example, if an attacker uploads a malicious script exploit.php.jpg to a website under the guise of an “innocent picture” and then opens it in the browser, Apache will “honestly” execute the malicious PHP code. Perhaps the only fairly reliable solution is to remove or change all potentially dangerous extensions from the names of downloaded files, in this case - php, php4, php5, phtml, etc. This technique is used, for example, in Drupal - an underscore is added to all “additional” extensions and the result is exploit.php_.jpg. However, it should be noted that a system administrator can add any additional file extension as a php handler, so some .html could suddenly turn into a terrible security hole just because .php looked ugly, was bad for SEO, or The customer didn't like it. So, what does FastCGI give us in terms of security? Firstly, if you use the Nginx web server instead of Apache, it will simply serve static files. Dot. In other words, the exploit.php.jpg file will be served “as is”, without any processing on the server side, so it simply won’t be possible to run a malicious script. Secondly, the FastCGI program and the web server can work under different users, which means they will have different rights to files and folders. For example, a web server can only read downloaded files - this is enough to return static data, and a FastCGI program can only read and change the contents of the folder with downloaded files - this is enough to download new and delete old files, but access directly to the downloaded files themselves will not have it, which means it will also not be able to execute malicious code. Thirdly, a FastCGI program can run in a chroot that is different from the chroot of the web server. Chroot itself (changing the root directory) allows you to greatly limit the rights of a program, that is, increase the overall security of the system, because the program simply will not be able to access files outside the specified directory.
Which web server with FastCGI support is better to choose?
In short, I use Nginx. In general, there are quite a lot of servers that support FastCGI, including commercial ones, so let us consider several alternatives.
Apache is perhaps the first thing that comes to mind, although it consumes much more resources than Nginx. For example, for 10,000 inactive HTTP keep-alive connections, Nginx consumes about 2.5M of memory, which is quite realistic even for a relatively weak machine, and Apache is forced to create a new thread for each new connection, so 10,000 threads is simply fantastic.
Lighttpd - The main disadvantage of this web server is that it processes all requests in one thread. This means that there may be problems with scalability - you simply will not be able to use all 4-8 cores of modern processors. And secondly, if for some reason the web server thread freezes (for example, due to a long wait for a response from the hard drive), your entire server will freeze. In other words, all other clients will stop receiving responses due to one slow request.
Another candidate is the Cherokee. According to the developers, in some cases it works faster than Nginx and Lighttpd.
What implementations of the FastCGI protocol are there?
At the moment, there are two implementations of the FastCGI protocol - the libfcgi.lib library from the creators of the FastCGI protocol, and Fastcgi++ - a C++ class library. Libfcgi has been developed since 1996 and, according to Open Market, is very stable and more widespread, so we will use it in this article. I would like to note that the library is written in C, the built-in C++ “wrapper” cannot be called high-level, so we will use the C interface.
I think there is no point in stopping at installing the library itself - it has a makefile, so there should be no problems. In addition, in popular distributions this library is available from packages.
What are sockets?
A general concept of sockets can be obtained from Wikipedia. In a nutshell, sockets are a method of interprocess communication.
As we remember, in all modern operating systems, each process uses its own address space. The operating system kernel is responsible for direct access to RAM, and if a program accesses a memory address that does not exist (in the context of a given program), the kernel will return a segmentation fault and close the program. This is wonderful - now errors in one program simply cannot harm others - they are, as it were, in other dimensions. But since the programs have different address spaces, there cannot be any shared data or data exchange either. But what if you really need to transfer data from one program to another? Actually, sockets were developed to solve this problem - two or more processes (read: programs) connect to the same socket and begin exchanging data. It turns out to be a kind of “window” into another world - through it you can receive and send data to other streams.
Depending on the type of connection used, sockets are different. For example, there are TCP sockets - they use a regular network to exchange data, that is, programs can run on different computers. The second most common option - Unix domain sockets - are suitable for exchanging data only within one machine and look like a regular path in the file system, but the hard drive is not actually used - all data exchange occurs in RAM. Due to the fact that there is no need to use a network stack, they are slightly faster (about 10%) than TCP sockets. For Windows OS, this socket option is called a named pipe.
Examples of using sockets for GNU/Linux OS can be found in this article. If you haven't worked with sockets before, I would recommend familiarizing yourself with it - it's not mandatory, but it will improve your understanding of the things presented here.
How to use the Libfcgi library?
So, we want to create a multi-threaded FastCGI application, so let me describe some of the most important functions.
First of all, the library needs to be initialized:
int FCGX_Init(void);
Attention! This function must be called before any other functions in this library and only once (just once, for any number of threads).
Next we need to open a listening socket:
int FCGX_OpenSocket(const char *path, int backlog);
The path variable contains the socket connection string. Both Unix domain sockets and TCP sockets are supported; the library will do all the necessary work on preparing parameters and calling a function.
Example connection strings for Unix domain sockets:
"/tmp/fastcgi/mysocket" "/tmp/fcgi_example.bare.sock"
I think everything is clear here: you just need to pass a unique path as a string, and all processes interacting with the socket must have access to it. I repeat once again: this method only works within one computer, but is somewhat faster than TCP sockets.
Example connection strings for TCP sockets:
":5000" ":9000"
In this case, a TCP socket is opened on the specified port (in this case, 5000 or 9000, respectively), and requests will be accepted from any IP address. Attention! This method is potentially unsafe - if your server is connected to the Internet, then your FastCGI program will accept requests from any other computer. This means that any attacker will be able to send a “death packet” to your FastCGI program. Of course, there is nothing good about this - in the best case, your program can simply “crash” and result in a denial of service (DoS attack, if you want), in the worst case, remote code execution (if you’re really unlucky), so always limit access to such ports using a firewall (firewall), and access should be granted only to those IP addresses that are actually used during the normal operation of the FastCGI program (the principle of “everything that is not explicitly allowed is prohibited”).
The following is an example of connection strings:
"*:5000" "*:9000"
The method is completely similar to the previous one: a TCP socket is opened to accept connections from any IP address, so in this case it is also necessary to carefully configure the firewall. The only advantage of such a connection line is purely administrative - any programmer or system administrator reading the configuration files will understand that your program accepts connections from any IP address, so, other things being equal, it is better to prefer this option to the previous one.
A safer option is to explicitly specify the IP address in the connection string:
"5.5.5.5:5000" "127.0.0.1:9000"
In this case, requests will be accepted only from the specified IP address (in this case - 5.5.5.5 or 127.0.0.1, respectively), for all other IP addresses this port (in this case - 5000 or 9000, respectively) will be closed. This increases overall system security, so whenever possible, always use this connection string format for TCP sockets - what if the system administrator “simply forgets” to configure the firewall? Please pay attention to the second example - the address of the same machine (localhost) is indicated there. This allows you to create a TCP socket on the same machine if for some reason you cannot use Unix domain sockets (for example, because the web server chroot and the FastCGI program chroot are in different folders and do not have common file paths ). Unfortunately, you cannot specify two or more different IP addresses, so if you really need to accept requests from multiple web servers located on different computers, you will either have to open the port completely (see the previous method) and rely on the settings of your firewall, or use multiple sockets on different ports. Also, the libfcgi library does not support IPv6 addresses - back in 1996, this standard was just born, so you will have to limit your appetites to ordinary IPv4 addresses. True, if you really need IPv6 support, it is relatively easy to add it by patching the FCGX_OpenSocket function - the library license allows this.
Attention! Using the function of specifying an IP address when creating a socket is not sufficient protection - IP spoofing attacks (substituting the IP address of the packet sender) are possible, so setting up a firewall is still required. Typically, as protection against IP spoofing, the firewall checks the correspondence between the IP address of the packet and the MAC address of the network card for all hosts on our local network (more precisely, for the broadcast domain with our host), and discards all packets coming from the Internet whose return address is in the zone of private IP addresses or local host (masks 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16, fc00::/7, 127.0.0.0/8 and::1/128). However, it is still better to use this library feature - in the case of an incorrectly configured firewall, sending a “death packet” from a forged IP address is much more difficult than from any other one, since the TCP protocol has built-in protection against IP spoofing.
The last kind of connection string is to use the host's domain name:
"example.com:5000" "localhost:9000"
In this case, the IP address will be obtained automatically based on the domain name of the host you specified. The restrictions are still the same - the host must have one IPv4 address, otherwise an error will occur. However, given that the socket is created once at the very beginning of working with FastCGI, this method is unlikely to be very useful - dynamically changing the IP address will still not work (more precisely, after each change of the IP address you will have to restart your FastCGI program). On the other hand, perhaps this will be useful for a relatively large network - remembering a domain name is still easier than an IP address.
The second parameter of the backlog function specifies the length of the socket request queue. The special value 0 (zero) indicates the default queue length for this operating system.
Every time a request comes from the web server, a new connection is placed in this queue, waiting to be processed by our FastCGI program. If the queue is completely full, all subsequent connection requests will fail - the web server will receive a Connection refused response. In principle, there is nothing wrong with this - the Nginx web server has its own queue of requests, and if there are no free resources, then new requests will wait their turn for processing already in the web server queue (at least until timeout will expire). In addition, if you have several servers running FastCGI, Nginx can pass such a request to a less loaded server.
So, let's try to figure out what the optimal queue length will be. In general, it is better to configure this parameter individually based on load testing data, but we will try to estimate the most suitable range for this value. The first thing you need to know is that the maximum queue length is limited (determined by the operating system kernel settings, usually no more than 1024 connections). Secondly, the queue consumes resources, cheap ones, but still resources, so you shouldn’t make it unreasonably long. Further, let's say our FastCGI program has 8 worker threads (quite realistic for modern 4-8-core processors), and each thread needs its own connection - tasks are processed in parallel. This means, ideally, we should already have 8 requests from the web server in order to immediately, without unnecessary delays, provide work to all threads. In other words, the minimum request queue size is the number of FastCGI program worker threads. You can try to increase this value by 50%-100% to provide some headroom for the load, since the time for data transfer over the network is finite.
Now let's determine the upper limit of this value. Here we need to know how many requests we can actually process and limit the request queue to this value. Imagine that you have made this queue too large - so much so that your customers simply get tired of waiting their turn and they simply leave your site without waiting for a response. Obviously, there is nothing good about this - the web server had to send a request to open a connection, which in itself is expensive, and then close this connection only because the FastCGI program did not have enough time to process this request. In a word, we are only wasting processor time, but we just don’t have enough of it! But this is not the worst thing - it’s worse when the client refuses to receive information from your site already after the request has begun to be processed. It turns out that we will have to completely process an essentially useless request, which, you see, will only worsen the situation. Theoretically, a situation may arise when most of the clients will not wait for a response when your processor is 100% loaded. Not good.
So, let's say we can process one request in 300 milliseconds (that is, 0.3 seconds). Next, we know that on average, 50% of visitors leave a resource if a web page takes more than 30 seconds to load. Obviously, 50% of dissatisfied people is too much, so we will limit the maximum page loading time to 5 seconds. This means a completely finished web page - after applying cascading style sheets and executing JavaScript - this stage on an average site can take 70% of the total loading time of a web page. So, no more than 5 minutes are left for loading data over the network *0.3 = 1.5 seconds. Next, you should remember that the html code, style sheets, scripts and graphics are transferred in different files, and first the html code, and then everything else. However, after receiving the html code the browser begins to request the remaining resources in parallel, so we can estimate the loading time of the html code as 50% of the total time for receiving data.So, we have no more than 1.5 * 0.5 = 0.75 seconds left to process one request. If on average one thread processes a request in 0.3 seconds, then there should be 0.75/0.3 = 2.5 requests per thread in the queue. Since we have 8 worker threads, the resulting queue size should be 2.5 *8 = 20 requests I would like to note that the above calculations are conditional - if you have a specific site, the values used in the calculation can be determined much more accurately, but it still provides a starting point for more optimal performance tuning.
So, we have received a socket descriptor, after which we need to allocate memory for the request structure. The description of this structure is as follows:
typedef struct FCGX_Request ( int requestId; int role; FCGX_Stream *in; FCGX_Stream *out; FCGX_Stream *err; char **envp; struct Params *paramsPtr; int ipcFd; int isBeginProcessed; int keepConnection; int appStatus; int nWriters; int flags; int listen_sock; int detached; ) FCGX_Request;
Attention! After receiving a new request, all previous data will be lost, so if you need to store data for a long time, use deep copying (copy the data itself, not pointers to the data).
You should know the following about this structure:
- the variables in, out and err play the role of input, output and error streams, respectively. The input stream contains the POST request data, the response of the FastCGI program (for example, http headers and html code of the web page) must be sent to the output stream, and the error stream will simply add an entry to the web server error log. In this case, you don’t have to use the error stream at all - if you really need to log errors, then, perhaps, it is better to use a separate file for this - transferring data over the network and its subsequent processing by the web server consumes additional resources.
- the envp variable contains the values of environment variables set by the web server and http headers, for example: SERVER_PROTOCOL, REQUEST_METHOD, REQUEST_URI, QUERY_STRING, CONTENT_LENGTH, HTTP_USER_AGENT, HTTP_COOKIE, HTTP_REFERER and so on. These headers are defined by the CGI and HTTP protocol standards, respectively; examples of their use can be found in any CGI program. The data itself is stored in an array of strings, with the last element of the array containing a null pointer (NULL) to indicate the end of the array. Each line (each element of the string array) contains one variable value in the format VARIABLE_NAME=VALUE, for example: CONTENT_LENGTH=0 (in this case, it means that this request does not have POST data, since its length is zero). If the envp string array does not contain the header you need, it means it was not transmitted. If you want to get all the variable values passed to the FastCGI program, simply read all the lines of the envp array in a loop until you encounter a pointer to NULL.
Actually, we’re done with the description of this structure - you won’t need all the other variables.
The memory has been allocated, now you need to initialize the request structure:
int FCGX_InitRequest(FCGX_Request *request, int sock, int flags);
The function parameters are as follows:
request - pointer to the data structure to be initialized
sock is the socket descriptor that we received after calling the FCGX_OpenSocket function. I would like to note that instead of a ready-made descriptor, you can pass 0 (zero) and receive a socket with default settings, but for us this method is not at all interesting - the socket will be opened on a random free port, which means we will not be able to configure our web correctly -server - we do not know in advance where exactly the data needs to be sent.
flags - flags. Actually, only one flag can be passed to this function - FCGI_FAIL_ACCEPT_ON_INTR - do not call FCGX_Accept_r when breaking.
After this you need to receive a new request:
int FCGX_Accept_r(FCGX_Request *request);
You need to pass into it the request structure already initialized at the previous stage. Attention! In a multi-threaded program, you must use synchronization when calling this function.
Actually, this function does all the work with sockets: first, it sends a response to the web server to the previous request (if there was one), closes the previous data channel and releases all resources associated with it (including request structure variables), then it receives a new request, opens a new data channel and prepares new data in the request structure for subsequent processing. If there is an error receiving a new request, the function returns an error code less than zero.
Next, you will probably need to get environment variables; for this you can either process the request->envp array yourself, or use the function
char *FCGX_GetParam(const char *name, FCGX_ParamArray envp);
where name is a string containing the name of the environment variable or http header whose value you want to get,
envp - an array of environment variables that are contained in the request->envp variable
The function returns the value of the environment variable we need as a string. Let the attentive reader not be alarmed by the type mismatch between char ** and FCGX_ParamArray - these types are declared synonyms (typedef char **FCGX_ParamArray).
In addition, you will probably need to send a response to the web server. To do this, you need to use the request->out output stream and the function
int FCGX_PutStr(const char *str, int n, FCGX_Stream *stream);
where str is a buffer containing the data to be output, without the terminating null (that is, the buffer can contain binary data),
n - buffer length in bytes,
stream - the stream into which we want to output data (request->out or request->err).
If you use standard null-terminated C strings, it will be more convenient to use the function
int FCGX_PutS(const char *str, FCGX_Stream *stream);
which will simply determine the length of the string using strlen(str) and call the previous function. Therefore, if you know the length of the string in advance (for example, you use C++ std::strings), it is better to use the previous function for efficiency reasons.
I would like to note that these functions work perfectly with UTF-8 strings, so there should be no problems with multilingual web applications.
You can also call these functions several times while processing the same request, in some cases this can improve performance. For example, you need to send a large file. Instead of downloading this entire file from your hard drive and then sending it in one piece, you can start sending data right away. As a result, the client, instead of a white browser screen, will begin to receive the data he is interested in, which, purely psychologically, will force him to wait a little longer. In other words, you kind of gain a little time for the page to load. I would also like to note that most resources (cascading style sheets, JavaScript, etc.) are indicated at the beginning of the web page, that is, the browser will be able to analyze part of the html code and start loading these resources earlier - another reason to display data in parts.
The next thing you may need to do is process the POST request. In order to get its value, you need to read data from the request->in stream using the function
int FCGX_GetStr(char * str, int n, FCGX_Stream *stream);
where str is a pointer to the buffer,
n - buffer size in bytes,
stream - the stream from which we are reading data.
The size of the transmitted data in a POST request (in bytes) can be determined using the environment variable CONTENT_LENGTH, the value of which, as we remember, can be obtained using the FCGX_GetParam function. Attention! Creating a str buffer based on the value of the CONTENT_LENGTH variable without any restrictions is a very bad idea: any attacker can send any POST request, no matter how large, and your server may simply run out of free RAM (this will result in a DoS attack, if you like). Instead, it is better to limit the buffer size to some reasonable value (from a few kilobytes to several megabytes) and call the FCGX_GetStr function several times.
The last important function flashes the output and error streams (sends to the client the still unsent data that we managed to place in the output and error streams) and closes the connection:
void FCGX_Finish_r(FCGX_Request *request);
I would like to especially note that this function is optional: the FCGX_Accept_r function also sends data to the client and closes the current connection before receiving a new request. The question arises: then why is it needed? Imagine that you have already sent the client all the necessary data, and now you need to perform some final operations: write statistics to the database, errors to the log file, etc. Obviously, the connection to the client is no longer needed, but the client (meaning the browser) is still waiting for information from us: what if we send something else? It is obvious that we cannot call FCGX_Accept_r ahead of time - after this we will need to start processing the next request. It is in this case that you will need the FCGX_Finish_r function - it will allow you to close the current connection before receiving a new request. Yes, we will be able to process the same number of requests per unit of time as without using this function, but the client will receive a response earlier - he will no longer have to wait for the end of our final operations, and it is precisely because of the higher request processing speed that we use FastCGI.
This, in fact, ends the description of the library’s functions and begins the processing of the received data.
A simple example of a multi-threaded FastCGI program
I think everything will be clear in the example. The only thing is that printing debug messages and “sleeping” the worker thread are done solely for demonstration purposes. When compiling the program, do not forget to include the libfcgi and libpthread libraries (gcc compiler options: -lfcgi and -lpthread).
#include #include #include #include "fcgi_config.h" #include "fcgiapp.h" #define THREAD_COUNT 8 #define SOCKET_PATH "127.0.0.1:9000" //stores the open socket handle static int socketId; static void *doit(void *a) ( int rc, i; FCGX_Request request; char *server_name; if(FCGX_InitRequest(&request, socketId, 0) != 0) ( //error when initializing the request structure printf("Can not init request\n"); return NULL; ) printf("Request is initiated\n"); for(;;) ( static pthread_mutex_t accept_mutex = PTHREAD_MUTEX_INITIALIZER; //try to receive a new request printf("Try to accept new request\n" ); pthread_mutex_lock(&accept_mutex); rc = FCGX_Accept_r(&request); pthread_mutex_unlock(&accept_mutex); if(rc< 0) { //ошибка при получении запроса printf("Can not accept new request\n"); break; } printf("request is accepted\n"); //получить значение переменной server_name = FCGX_GetParam("SERVER_NAME", request.envp); //вывести все HTTP-заголовки (каждый заголовок с новой строки) FCGX_PutS("Content-type: text/html\r\n", request.out); //между заголовками и телом ответа нужно вывести пустую строку FCGX_PutS("\r\n", request.out); //вывести тело ответа (например - html-код веб-страницы) FCGX_PutS("\r\n", request.out); FCGX_PutS(" \r\n", request.out); FCGX_PutS("\r\n", request.out); FCGX_PutS(" \r\n", request.out); FCGX_PutS("
FastCGI Hello! (multi-threaded C, fcgiapp library)
\r\n", request.out); FCGX_PutS("
Request accepted from host ", request.out); FCGX_PutS(server_name ? server_name: "?", request.out); FCGX_PutS("
\r\n", request.out); FCGX_PutS("\r\n", request.out); FCGX_PutS("\r\n", request.out); //"fall asleep" - imitation of a multi-threaded environment sleep(2); //close the current connection FCGX_Finish_r(&request); //final actions - recording statistics, logging errors, etc. ) return NULL; ) int main(void) ( int i; pthread_t id; //initialize the library FCGX_Init(); printf("Lib is inited\n"); //open a new socket socketId = FCGX_OpenSocket(SOCKET_PATH, 20); if(socketId< 0) { //ошибка при открытии сокета return 1; } printf("Socket is opened\n"); //создаём рабочие потоки for(i = 0; i < THREAD_COUNT; i++) { pthread_create(&id[i], NULL, doit, NULL); } //ждем завершения рабочих потоков for(i = 0; i < THREAD_COUNT; i++) { pthread_join(id[i], NULL); } return 0; }
Simple Nginx configuration example
Actually, the simplest example of a config looks like this:
Server ( server_name localhost; location / ( fastcgi_pass 127.0.0.1:9000; #fastcgi_pass unix:/tmp/fastcgi/mysocket; #fastcgi_pass localhost:9000; include fastcgi_params; ) )
In this case, this config is enough for our FastCGI program to work correctly. The commented lines are an example of working with Unix domain sockets and specifying a domain host name instead of an IP address.
After compiling and running the program and setting up Nginx, I got a proud inscription at the localhost address:
FastCGI Hello! (multi-threaded C, fcgiapp library)
Thanks to everyone who read to the end.

Non-refundable c board cgi. Directors' mobile phones

What is CGI?

Why CGI?

Selecting a programming language

Cautions

Summary

Chapter 2: Basics

Hello, World!

Hello, World!

Splitting hello.cgi

Hello, World! in C

Hello, World!

Note

CGI rendering

Note

Installing and Running a CGI Program

Configuring the server for CGI

Warning

Installing CGI on UNIX servers

Clue

Some generic UNIX servers

Installing CGI on Windows

Installing CGI on Macintosh

Executing CGI

A quick tutorial on HTML forms

Tag

Tag

Form Submission

Accepting input from the browser

Environment Variables

CGI Environment

CGI Environment

GET or POST?

Coded Input

Parsing input

Cgi-lib.pl

Cgihtml

A simple CGI program

Listing 2.7. nameage.cgi in C

General programming strategy

Summary

What is FastCGI?

Why do I need FastCGI when I already have PHP, Ruby, Python, Perl, etc.?

Which web server with FastCGI support is better to choose?

What implementations of the FastCGI protocol are there?

What are sockets?

How to use the Libfcgi library?

A simple example of a multi-threaded FastCGI program

FastCGI Hello! (multi-threaded C, fcgiapp library)

Simple Nginx configuration example

Popular articles

Latest articles

Sections

Pages

Special projects

Contacts