PHP Cookbook/Forms

From WikiContent

< PHP Cookbook(Difference between revisions)
Jump to: navigation, search
m (1 revision(s))
Current revision (13:36, 7 March 2008) (edit) (undo)
(Initial conversion from Docbook)
 

Current revision

PHP Cookbook


Contents

Introduction

The genius of PHP is its seamless integration of form variables into your programs. It makes web programming smooth and simple, from web form to PHP code to HTML output.

There's no built-in mechanism in HTTP to allow you to save information from one page so you can access it in other pages. That's because HTTP is a stateless protocol. Recipe 9.2, Recipe 9.4, Recipe 9.5, and Recipe 9.6 all show ways to work around the fundamental problem of figuring out which user is making which requests to your web server.

Processing data from the user is the other main topic of this chapter. You should never trust the data coming from the browser, so it's imperative to always validate all fields, even hidden form elements. Validation takes many forms, from ensuring the data match certain criteria, as discussed in Recipe 9.3, to escaping HTML entities to allow the safe display of user entered data, as covered in Recipe 9.9. Furthermore, Recipe 9.8 tells how to protect the security of your web server, and Recipe 9.7 covers how to process files uploaded by a user.

Whenever PHP processes a page, it checks for GET and POST form variables, uploaded files, applicable cookies, and web server and environment variables. These are then directly accessible in the following arrays: $_GET , $_POST, $_FILES, $_COOKIE, $_SERVER, and $_ENV. They hold, respectively, all variables set by GET requests, POST requests, uploaded files, cookies, the web server, and the environment. There's also $_REQUEST , which is one giant array that contains the values from the other six arrays.

When placing elements inside of $_REQUEST, if two arrays both have a key with the same name, PHP falls back upon the variables_order configuration directive. By default, variables_order is EGPCS (or GPCS, if you're using the php.ini-recommended configuration file). So, PHP first adds environment variables to $_REQUEST and then adds GET, POST, cookie, and web server variables to the array, in this order. For instance, since C comes after P in the default order, a cookie named username overwrites a POST variable named username.

If you don't have access to PHP's configuration files, you can use ini_get( ) to check a setting:

print ini_get('variables_order');
EGPCS
         

You may need to do this because your ISP doesn't let you view configuration settings or because your script may run on someone else's server. You can also use phpinfo( ) to view settings. However, if you can't rely on the value of variables_order, you should directly access $_GET and $_POST instead of using $_REQUEST.

The arrays containing external variables, such as $_REQUEST, are superglobals. As such, they don't need to be declared as global inside of a function or class. It also means you probably shouldn't assign anything to these variables, or you'll overwrite the data stored in them.

Prior to PHP 4.1, these superglobal variables didn't exist. Instead there were regular arrays named $HTTP_COOKIE_VARS, $HTTP_ENV_VARS, $HTTP_GET_VARS, $HTTP_POST_VARS, $HTTP_POST_FILES, and $HTTP_SERVER_VARS. These arrays are still available for legacy reasons, but the newer arrays are easier to work with. These older arrays are populated only if the track_vars configuration directive is on, but, as of PHP 4.0.3, this feature is always enabled.

Finally, if the register_globals configuration directive is on, all these variables are also available as variables in the global namespace. So, $_GET['password'] is also just $password. While convenient, this introduces major security problems because malicious users can easily set variables from the outside and overwrite trusted internal variables. Starting with PHP 4.2, register_globals defaults to off.

With this knowledge, here is a basic script to put things together. The form asks the user to enter his first name, then replies with a welcome message. The HTML for the form looks like this:

<form action="/hello.php" method="post">
What is your first name?
<input type="text" name="first_name">
<input type="submit" value="Say Hello">
</form>

The name of the text input element inside the form is first_name. Also, the method of the form is post. This means that when the form is submitted, $_POST['first_name'] will hold whatever string the user typed in. (It could also be empty, of course, if he didn't type anything.)

For simplicity, however, let's assume the value in the variable is valid. (The term "valid" is open for definition, depending on certain criteria, such as not being empty, not being an attempt to break into the system, etc.) This allows us to omit the error checking stage, which is important but gets in the way of this simple example. So, here is a simple hello.php script to process the form:

echo 'Hello ' . $_POST['first_name'] . '!';

If the user's first name is Joe, PHP prints out:

Hello Joe!

Processing Form Input

Problem

You want to use the same HTML page to emit a form and then process the data entered into it. In other words, you're trying to avoid a proliferation of pages that each handle different steps in a transaction.

Solution

Use a hidden field in the form to tell your program that it's supposed to be processing the form. In this case, the hidden field is named stage and has a value of process:

if (isset($_POST['stage']) && ('process' == $_POST['stage'])) {
    process_form();
} else {
    print_form();
}

Discussion

During the early days of the Web, when people created forms, they made two pages: a static HTML page with the form and a script that processed the form and returned a dynamically generated response to the user. This was a little unwieldy, because form.html led to form.cgi and if you changed one page, you needed to also remember to edit the other, or your script might break.

Forms are easier to maintain when all parts live in the same file and context dictates which sections to display. Use a hidden form field named stage to track your position in the flow of the form process; it acts as a trigger for the steps that return the proper HTML to the user. Sometimes, however, it's not possible to design your code to do this; for example, when your form is processed by a script on someone else's server.

When writing the HTML for your form, however, don't hardcode the path to your page directly into the action. This makes it impossible to rename or relocate your page without also editing it. Instead, PHP supplies a helpful variable:

$_SERVER['PHP_SELF'] 

This variable is an alias to the URL of the current page. So, set the value of the action attribute to that value, and your form always resubmits, even if you've moved the file to a new place on the server.

So, the example in the introduction of this chapter is now:

if (isset($_POST['stage']) && ('process' == $_POST['stage'])) {
    process_form();
} else {
    print_form();
}

function print_form() {
    echo <<<END
        <form action="$_SERVER[PHP_SELF]" method="post">
        What is your first name?
        <input type="text" name="first_name">
        <input type="hidden" name="stage" value="process">
        <input type="submit" value="Say Hello">
        </form>
END;
}

function process_form() {
    echo 'Hello ' . $_POST['first_name'] . '!';
}

If your form has more than one step, just set stage to a new value for each step.

See Also

Recipe 9.4 for handling multipage forms.

Validating Form Input

Problem

You want to ensure data entered from a form passes certain criteria.

Solution

Create a function that takes a string to validate and returns true if the string passes a check and false if it doesn't. Inside the function, use regular expressions and comparisons to check the data. For example, Example 9-1 shows the pc_validate_zipcode( ) function, which validates a U.S. Zip Code.

Example 9-1. pc_validate_zipcode( )

function pc_validate_zipcode($zipcode) {
    return preg_match('/^[0-9]{5}([- ]?[0-9]{4})?$/', $zipcode);
}

Here's how to use it:

if (pc_validate_zipcode($_REQUEST['zipcode'])) {
    // U.S. Zip Code is okay, can proceed
    process_data();
} else {
    // this is not an okay Zip Code, print an error message
    print "Your ZIP Code is should be 5 digits (or 9 digits, if you're ";
    print "using ZIP+4).";
    print_form();
}

Discussion

Deciding what constitutes valid and invalid data is almost more of a philosophical task than a straightforward matter of following a series of fixed steps. In many cases, what may be perfectly fine in one situation won't be correct in another.

The easiest check is making sure the field isn't blank. The empty( ) function best handles this problem.

Next come relatively easy checks, such as the case of a U.S. Zip Code. Usually, a regular expression or two can solve these problems. For example:

 /^[0-9]{5}([- ]?[0-9]{4})?$/ 

finds all valid U.S. Zip Codes.

Sometimes, however, coming up with the correct regular expression is difficult. If you want to verify that someone has entered only two names, such as "Alfred Aho," you can check against:

 /^[A-Za-z]+ +[A-Za-z]+$/

However, Tim O'Reilly can't pass this test. An alternative is /^\S+\s+\S+$/; but then Donald E. Knuth is rejected. So think carefully about the entire range of valid input before writing your regular expression.

In some instances, even with regular expressions, it becomes difficult to check if the field is legal. One particularly popular and tricky task is validating an email address, as discussed in Recipe 13.7. Another is how to make sure a user has correctly entered the name of her U.S. state. You can check against a listing of names, but what if she enters her postal service abbreviation? Will MA instead of Massachusetts work? What about Mass.?

One way to avoid this issue is to present the user with a dropdown list of pregenerated choices. Using a select element, users are forced by the form's design to select a state in the format that always works, which can reduce errors. This, however, presents another series of difficulties. What if the user lives some place that isn't one of the choices? What if the range of choices is so large this isn't a feasible solution?

There are a number of ways to solve these types of problems. First, you can provide an "other" option in the list, so that a non-U.S. user can successfully complete the form. (Otherwise, she'll probably just pick a place at random, so she can continue using your site.) Next, you can divide the registration process into a two-part sequence. For a long list of options, a user begins by picking the letter of the alphabet his choice begins with; then, a new page provides him with a list containing only the choices beginning with that letter.

Finally, there are even trickier problems. What do you do when you want to make sure the user has correctly entered information, but you don't want to tell her you did so? A situation where this is important is a sweepstakes; in a sweepstakes, there's often a special code box on the entry form in which a user enters a string — AD78DQ — from an email or flier she's received. You want to make sure there are no typos, or your program won't count her as a valid entrant. You also don't want to allow her to just guess codes, because then she could try out those codes and crack the system.

The solution is to have two input boxes. A user enters her code twice; if the two fields match, you accept the data as legal and then (silently) validate the data. If the fields don't match, you reject the entry and have the user fix it. This procedure eliminates typos and doesn't reveal how the code validation algorithm works; it can also prevent misspelled email addresses.

Finally, PHP performs server-side validation. Server-side validation requires that a request be made to the server, and a page returned in response; as a result, it can be slow. It's also possible to do client-side validation using JavaScript. While client-side validation is faster, it exposes your code to the user and may not work if the client doesn't support JavaScript or has disabled it. Therefore, you should always duplicate all client-side validation code on the server.

See Also

Recipe 13.7 for a regular expression for validating email addresses; Chapter 7, "Validation on the Server and Client," of Web Database Applications with PHP and MySQL (Hugh Williams and David Lane, O'Reilly).

Working with Multipage Forms

Problem

You want to use a form that displays more than one page and preserve data from one page to the next.

Solution

Use session tracking:

session_start();
$_SESSION['username'] = $_GET['username'];

You can also include variables from a form's earlier pages as hidden input fields in its later pages:

<input type="hidden" name="username" 
       value="<?php echo htmlentities($_GET['username']); ?>">

Discussion

Whenever possible, use session tracking. It's more secure because users can't modify session variables. To begin a session, call session_start( ); this creates a new session or resumes an existing one. Note that this step is unnecessary if you've enabled session.auto_start in your php.ini file. Variables assigned to $_SESSION are automatically propagated. In the Solution example, the form's username variable is preserved by assigning $_GET['username'] to $_SESSION['username'].

To access this value on a subsequent request, call session_start( ) and then check $_SESSION['username']:

session_start( );
$username = htmlentities($_SESSION['username']);
print "Hello $username.";

In this case, if you don't call session_start( ), $_SESSION isn't set.

Be sure to secure the server and location where your session files are located (the filesystem, database, etc.); otherwise your system will be vulnerable to identity spoofing.

If session tracking isn't enabled for your PHP installation, you can use hidden form variables as a replacement. However, passing data using hidden form elements isn't secure because anyone can edit these fields and fake a request; with a little work, you can increase the security to a reliable level.

The most basic way to use hidden fields is to include them inside your form.

<form action="<?php echo $_SERVER['PHP_SELF']; ?>"
      method="get">

<input type="hidden" name="username" 
       value="<?php echo htmlentities($_GET['username']); ?>">

When this form is resubmitted, $_GET['username'] holds its previous value unless someone has modified it.

A more complex but secure solution is to convert your variables to a string using serialize( ) , compute a secret hash of the data, and place both pieces of information in the form. Then, on the next request, validate the data and unserialize it. If it fails the validation test, you'll know someone has tried to modify the information.

The pc_encode( ) encoding function shown in Example 9-2 takes the data to encode in the form of an array.

Example 9-2. pc_encode( )

$secret = 'Foo25bAr52baZ';

function pc_encode($data) {
  $data = serialize($data);
  $hash = md5($GLOBALS['secret'] . $data);
  return array($data, $hash);
}

In function pc_encode( ), the data is serialized into a string, a validation hash is computed, and those variables are returned.

The pc_decode( ) function shown in Example 9-3 undoes the work of its counterpart.

Example 9-3. pc_decode( )

function pc_decode($data, $hash) {
  if (!empty($data) && !empty($hash)) {
    if (md5($GLOBALS['secret'] . $data) == $hash) {
      return unserialize($data);
    } else {
      error_log("Validation Error: Data has been modified");
      return false;
    }
  }
  return false;
}

The pc_decode( ) function recreates the hash of the secret word and compares it to the hash value from the form. If they're equal, $data is valid, so it's unserialized. If it flunks the test, the function writes a message to the error log and returns false.

These functions go together like this:

<?php
$secret = 'Foo25bAr52baZ';

// Load in and validate old data
if (! $data = pc_decode($_GET['data'], $_GET['hash'])) {
  // crack attempt
}

// Process form (new form data is in $_GET)

// Update $data
$data['username'] = $_GET['username'];
$data['stage']++;
unset($data['password']);

// Encode results
list ($data, $hash) = pc_encode($data);

// Store data and hash inside the form
?>
<form action="<?php echo $_SERVER['PHP_SELF']; ?>" method="get">
...

<input type="hidden" name="data" 
       value="<?php echo htmlentities($data); ?>">
<input type="hidden" name="hash" 
       value="<?php echo htmlentities($hash); ?>">
</form>

At the top of the script, we pass pc_decode( ) the variables from the form for decoding. Once the information is loaded into $data, form processing can proceed by checking in $_GET for new variables and in $data for old ones. Once that's complete, update $data to hold the new values and then encode it, calculating a new hash in the process. Finally, print out the new form and include $data and $hash as hidden variables.

See Also

Recipe 8.6 and Recipe 8.7 for information on using the session module; Recipe 9.9 for details on using htmlentities( ) to escape control characters in HTML output; Recipe 14.4 for information on verifying data with hashes; documentation on session tracking at http://www.php.net/session and in Recipe 8.5; documentation on serialize( ) at http://www.php.net/serialize and unserialize( ) at http://www.php.net/unserialize.

Redisplaying Forms with Preserved Information and Error Messages

Problem

When there's a problem with data entered in a form, you want to print out error messages alongside the problem fields, instead of a generic error message at the top of the form. You also want to preserve the values the user typed into the form the first time.

Solution

Use an array, $errors, and store your messages in the array indexed by the name of the field.

if (! pc_validate_zipcode($_REQUEST['zipcode'])) {
    $errors['zipcode'] = "This is is a bad ZIP Code. ZIP Codes must "
                       . "have 5 numbers and no letters.";
}

When you redisplay the form, you can display each error by its field and include the original value in the field:

echo $errors['zipcode'];
$value = isset($_REQUEST['zipcode']) ?
               htmlentities($_REQUEST['zipcode']) : '';
echo "<input type=\"text\" name=\"zipcode\" value=\"$value\">";

Discussion

If your users encounter errors when filling out a long form, you can increase the overall usability of your form if you highlight exactly where the errors need to be fixed.

Consolidating all errors in a single array has many advantages. First, you can easily check if your validation process has located any items that need correction; just use count($errors). This method is easier than trying to keep track of this fact in a separate variable, especially if the flow is complex or spread out over multiple functions. Example 9-4 shows the pc_validate_form( ) validation function, which uses an $errors array.

Example 9-4. pc_validate_form( )

function pc_validate_form( ) {
  if (! pc_validate_zipcode($_POST['zipcode'])) {
     $errors['zipcode'] = "ZIP Codes are 5 numbers";
  }

  if (! pc_validate_email($_POST['email'])) {
     $errors['email'] = "Email addresses look like user@example.com";
  }

  return $errors;
}

This is clean code because all errors are stored in one variable. You can easily pass around the variable if you don't want it to live in the global scope.

Using the variable name as the key preserves the links between the field that caused the error and the actual error message itself. These links also make it easy to loop through items when displaying errors.

You can automate the repetitive task of printing the form; the pc_print_form() function in Example 9-5 shows how.

Example 9-5. pc_print_form( )

function pc_print_form($errors) {
    $fields = array('name'   => 'Name',
                    'rank'   => 'Rank', 
                    'serial' => 'Serial');

    if (count($errors)) { 
        echo 'Please correct the errors in the form below.';
    }

    echo '<table>';

    // print out the errors and form variables
    foreach ($fields as $field => $field_name) {
        // open row
        echo '<tr><td>';

        // print error
        if (!empty($errors[$field])) {
            echo $errors[$field];
        } else {
            echo '&nbsp;'; // to prevent odd looking tables
        }

        echo "</td><td>";

        // print name and input
        $value = isset($_REQUEST[$field]) ? 
                       htmlentities($_REQUEST[$field]) : '';

        echo "$field_name: ";
        echo "<input type=\"text\" name=\"$field\" value=\"$value\">";
        echo '</td></tr>';
    }

    echo '</table>';
}

The complex part of pc_print_form( ) comes from the $fields array. The key is the variable name; the value is the pretty display name. By defining them at the top of the function, you can create a loop and use foreach to iterate through the values; otherwise, you need three separate lines of identical code. This integrates with the variable name as a key in $errors, because you can find the error message inside the loop just by checking $errors[$field].

If you want to extend this example beyond input fields of type text, modify $fields to include more meta-information about your form fields:

$fields = array('name' => array('name' => 'Name', 'type' => 'text'),
                'rank' => array('name' => 'Rank', 'type' => 'password'),
                'serial' => array('name' => 'Serial', 'type' => 'hidden')
               );

See Also

Recipe 9.3 for simple form validation.

Guarding Against Multiple Submission of the Same Form

Problem

You want to prevent people from submitting the same form multiple times.

Solution

Generate a unique identifier and store the token as a hidden field in the form. Before processing the form, check to see if that token has already been submitted. If it hasn't, you can proceed; if it has, you should generate an error.

When creating the form, use uniqid( ) to get a unique identifier:

<?php
$unique_id = uniqid(microtime(),1);
...
?>
<input type="hidden" name="unique_id" value="<?php echo $unique_id; ?>">
</form>

Then, when processing, look for this ID:

$unique_id  = $dbh->quote($_GET['unique_id']);
$sth = $dbh->query("SELECT * FROM database WHERE unique_id = $unique_id");

if ($sth->numRows( )) {
    // already submitted, throw an error
} else {
   // act upon the data
}

Discussion

For a variety of reasons, users often resubmit a form. Usually it's a slip-of-the-mouse: double-clicking the Submit button. They may hit their web browser's Back button to edit or recheck information, but then they re-hit Submit instead of Forward. It can be intentional: they're trying to stuff the ballot box for an online survey or sweepstakes. Our Solution prevents the nonmalicious attack and can slow down the malicious user. It won't, however, eliminate all fraudulent use: more complicated work is required for that.

The Solution does prevent your database from being cluttered with too many copies of the same record. By generating a token that's placed in the form, you can uniquely identify that specific instance of the form, even when cookies is disabled. When you then save the form's data, you store the token alongside it. That allows you to easily check if you've already seen this form and record the database it belongs to.

Start by adding an extra column to your database table — unique_id — to hold the identifier. When you insert data for a record, add the ID also. For example:

$username  = $dbh->quote($_GET['username']);
$unique_id = $dbh->quote($_GET['unique_id']);

$sth = $dbh->query("INSERT INTO members ( username,  unique_id)
                                 VALUES ($username, $unique_id)");

By associating the exact row in the database with the form, you can more easily handle a resubmission. There's no correct answer here; it depends on your situation. In some cases, you'll want to ignore the second posting all together. In others, you'll want to check if the record has changed, and, if so, present the user with a dialog box asking if they want to update the record with the new information or keep the old data. Finally, to reflect the second form submission, you could update the record silently, and the user never learns of a problem.

All these possibilities should be considered given the specifics of the interaction. Our opinion is there's no reason to allow the deficits of HTTP to dictate the user experience. So, while the third choice, silently updating the record, isn't what normally happens, in many ways this is the most natural option. Applications we've developed with this method are more user friendly; the other two methods confuse or frustrate most users.

It's tempting to avoid generating a random token and instead use a number one greater then the number of records already in the database. The token and the primary key will thus be the same, and you don't need to use an extra column. There are (at least) two problems with this method. First, it creates a race condition. What happens when a second person starts the form before the first person has completed it? The second form will then have the same token as the first, and conflicts will occur. This can be worked around by creating a new blank record in the database when the form is requested, so the second person will get a number one higher than the first. However, this can lead to empty rows in the database if users opt not to complete the form.

The other reason not do this is because it makes it trivial to edit another record in the database by manually adjusting the ID to a different number. Depending on your security settings, a fake GET or POST submission allows the data to be altered without difficulty. A long random token, however, can't be guessed merely by moving to a different integer.

See Also

Recipe 14.4 for more details on verifying data with hashes; documentation on uniqid( ) at http://www.php.net/uniqid.

Processing Uploaded Files

Problem

You want to process a file uploaded by a user.

Solution

Use the $_FILES array:

// from <input name="event" type="file">
if (is_uploaded_file($_FILES['event']['tmp_name'])) {
    readfile($_FILES['event']['tmp_name']); // print file on screen
}

Discussion

Starting in PHP 4.1, all uploaded files appear in the $_FILES superglobal array. For each file, there are four pieces of information:

name
The name assigned to the form input element
type
The MIME type of the file
size
The size of the file in bytes
tmp_name
The location in which the file is temporarily stored on the server.

If you're using an earlier version of PHP, you need to use $HTTP_POST_FILES instead.

After you've selected a file from that array, use is_uploaded_file( ) to confirm that the file you're about to process is a legitimate file resulting from a user upload, then process it as you would other files on the system. Always do this. If you blindly trust the filename supplied by the user, someone can alter the request and add names such as /etc/passwd to the list for processing.

You can also move the file to a permanent location; use move_uploaded_file( ) to safely transfer the file:

// move the file: move_uploaded_file() also does a check of the file's
// legitimacy, so there's no need to also call is_uploaded_file()
move_uploaded_file($_FILES['event']['tmp_name'], '/path/to/file.txt');

Note that the value stored in tmp_name is the complete path to the file, not just the base name. Use basename( ) to chop off the leading directories if needed.

Be sure to check that PHP has permission to read and write to both the directory in which temporary files are saved (see the upload_tmp_dir configuration directive to check where this is) and the location in which you're trying to copy the file. This can often be user nobody or apache, instead of your personal username. Because of this, if you're running under safe_mode, copying a file to a new location will probably not allow you to access it again.

Processing files can often be a subtle task because not all browsers submit the same information. It's important to do it correctly, however, or you open yourself up to a possible security hole. You are, after all, allowing strangers to upload any file they choose to your machine; malicious people may see this as an opportunity to crack into or crash the computer.

As a result, PHP has a number of features that allow you to place restrictions on uploaded files, including the ability to completely turn off file uploads all together. So, if you're experiencing difficulty processing uploaded files, check that your file isn't being rejected because it seems to pose a security risk.

To do such a check first, make sure file_uploads is set to On inside your configuration file. Next, make sure your file size isn't larger than upload_max_filesize; this defaults to 2 MB, which stops someone trying to crash the machine by filling up the hard drive with a giant file. Additionally, there's a post_max_size directive, which controls the maximum size of all the POST data allowed in a single request; its initial setting is 8 MB.

From the perspective of browser differences and user error, if you can't get $_FILES to populate with information, make sure you add enctype="multipart/form-data" to the form's opening tag; PHP needs this to trigger processing. If you can't do so, you need to manually parse $HTTP_RAW_POST_DATA. (See RFCs 1521 and 1522 for the MIME specification at http://www.faqs.org/rfcs/rfc1521.html and http://www.faqs.org/rfcs/rfc1522.html.)

Also, if no file is selected for uploading, versions of PHP prior to 4.1 set tmp_name to none; newer versions set it to the empty string. PHP 4.2.1 allows files of length 0. To be sure a file was uploaded and isn't empty (although blank files may be what you want, depending on the circumstances), you need to make sure tmp_name is set and size is greater than 0. Last, not all browsers necessarily send the same MIME type for a file; what they send depends on their knowledge of different file types.

See Also

Documentation on handling file uploads at http://www.php.net/features.file-upload and on basename() at http://www.php.net/basename.

Securing PHP's Form Processing

Problem

You want to securely process form input variables and not allow someone to maliciously alter variables in your code.

Solution

Disable the register_globals configuration directive and access variables only from the $_REQUEST array. To be even more secure, use $_GET , $_POST, and $_COOKIE to make sure you know exactly where your variables are coming from.

To do this, make sure this line appears in your php.ini file:

register_globals = Off

As of PHP 4.2, this is the default configuration.

Discussion

When register_globals is set on, external variables, including those from forms and cookies, are imported directly into the global namespace. This is a great convenience, but it can also open up some security holes if you're not very diligent about checking your variables and where they're defined. Why? Because there may be a variable you use internally that isn't supposed to be accessible from the outside but has its value rewritten without your knowledge.

Here is a simple example. You have a page in which a user enters a username and password. If they are validated, you return her user identification number and use that numerical identifier to look up and print out her personal information:

// assume magic_quotes_gpc is set to Off
$username = $dbh->quote($_GET['username']);
$password = $dbh->quote($_GET['password']);

$sth = $dbh->query("SELECT id FROM users WHERE username = $username AND
                    password = $password");

if (1 == $sth->numRows( )) { 
    $row = $sth->fetchRow(DB_FETCHMODE_OBJECT);
    $id = $row->id;
} else {
    "Print bad username and password";
}

if (!empty($id)) {
    $sth = $dbh->query("SELECT * FROM profile WHERE id = $id");
}

Normally, $id is set only by your program and is a result of a verified database lookup. However, if someone alters the GET string, and passes in a value for $id, with register_globals enabled, even after a bad username and password lookup, your script still executes the second database query and returns results. Without register_globals, $id remains unset because only $_REQUEST['id'] (and $_GET['id']) are set.

Of course, there are other ways to solve this problem, even when using register_globals. You can restructure your code not to allow such a loophole.

$sth = $dbh->query("SELECT id FROM users WHERE username = $username AND
                    password = $password");
 
if (1 == $sth->numRows( )) { 
    $row = $sth->fetchRow(DB_FETCHMODE_OBJECT);
    $id = $row->id;
    if (!empty($id)) {
        $sth = $dbh->query("SELECT * FROM profile WHERE id = $id");
    }
} else {
    "Print bad username and password";
}

Now you use $id only when it's been explicitly set from a database call. Sometimes, however, it is difficult to do this because of how your program is laid out. Another solution is to manually unset( ) or initialize all variables at the top of your script:

unset($id);

This removes the bad $id value before it gets a chance to affect your code. However, because PHP doesn't require variable initialization, it's possible to forget to do this in one place; a bug can then slip in without a warning from PHP.

See Also

Documentation on register_globals at http://www.php.net/security.registerglobals.php.

Escaping Control Characters from User Data

Problem

You want to securely display user-entered data on an HTML page.

Solution

For HTML you wish to display as plain text, with embedded links and other tags, use htmlentities( ) :

echo htmlentities('<p>O'Reilly & Associates</p>');
&lt;p&gt;O'Reilly & Associates&lt;/p&gt;
            

Discussion

PHP has a pair of functions to escape characters in HTML. The most basic is htmlspecialchars( ) , which escapes four characters: < > " and &. Depending on optional parameters, it can also translate ' instead of or in addition to ". For more complex encoding, use htmlentities( ); it expands on htmlspecialchars( ) to encode any character that has an HTML entity.

$html = "<a href='fletch.html'>Stew's favorite movie.</a>\n";
print htmlspecialchars($html);                // double-quotes
print htmlspecialchars($html, ENT_QUOTES);    // single- and double-quotes
print htmlspecialchars($html, ENT_NOQUOTES);  // neither
&lt;a href=&quot;fletch.html&quot;&gt;Stew's favorite movie.&lt;/a&gt;
               &lt;a href=&quot;fletch.html&quot;&gt;Stew&#039;s favorite movie.&lt;/a&gt;
               &lt;a href="fletch.html"&gt;Stew's favorite movie.&lt;/a&gt;
            

Both functions allow you to pass in a character encoding table that defines what characters map to what entities. To retrieve either table used by the previous functions, use get_html_translation_table( ) and pass in HTML_ENTITIES or HTML_SPECIALCHARS. This returns an array that maps characters to entities; you can use it as the basis for your own table.

$copyright = "Copyright © 2003 O'Reilly & Associates\n";
$table = get_html_translation_table(); // get <, >, ", and &
$table[©] = '&copy;'                   // add ©
print strtr($copyright, $table);
Copyright &copy; 2003 O'Reilly &amp; Associates
            

See Also

Recipe 13.9, Recipe 18.21, and Recipe 10.8; documentation on htmlentities( ) at http://www.php.net/htmlentities and htmlspecialchars( ) at http://www.php.net/htmlspecialchars.

Handling Remote Variables with Periods in Their Names

Problem

You want to process a variable with a period in its name, but when a form is submitted, you can't find the variable.

Solution

Replace the period in the variable's name with an underscore. For example, if you have a form input element named foo.bar, you access it inside PHP as the variable $_REQUEST['foo_bar'].

Discussion

Because PHP uses the period as a string concatenation operator, a form variable called animal.height is automatically converted to animal_height, which avoids creating an ambiguity for the parser. While $_REQUEST['animal.height'] lacks these ambiguities, for legacy and consistency reasons, this happens regardless of your register_globals settings.

You usually deal with automatic variable name conversion when you process an image used to submit a form. For instance: you have a street map showing the location of your stores, and you want people to click on one for additional information. Here's an example:

<input type="image" name="locations" src="locations.gif">

When a user clicks on the image, the x and y coordinates are submitted as locations.x and locations.y. So, in PHP, to find where a user clicked, you need to check $_REQUEST['locations_x'] and $_REQUEST['locations_y'].

It's possible, through a series of manipulations, to create a variable inside PHP with a period:

${"a.b"} = 123; // forced coercion using {}

$var = "c.d";   // indirect variable naming
$$var = 456;       

print ${"a.b"} . "\n";
print $$var . "\n";
123
               456
            

This is generally frowned on because of the awkward syntax.

See Also

Documentation on variables from outside PHP at http://www.php.net/language.variables.external.php.

Using Form Elements with Multiple Options

Problem

You have a form element with multiple values, such as a checkbox or select element, but PHP sees only one value.

Solution

Place brackets ([ ]) after the variable name:

<input type="checkbox" name="boroughs[]" value="bronx"> The Bronx
<input type="checkbox" name="boroughs[]" value="brooklyn"> Brooklyn
<input type="checkbox" name="boroughs[]" value="manhattan"> Manhattan
<input type="checkbox" name="boroughs[]" value="queens"> Queens
<input type="checkbox" name="boroughs[]" value="statenisland"> Staten Island

Inside your program, treat the variable as an array:

print 'I love ' . join(' and ', $boroughs) . '!';

Discussion

By placing [ ] after the variable name, you tell PHP to treat it as an array instead of a scalar. When it sees another value assigned to that variable, PHP auto-expands the size of the array and places the new value at the end. If the first three boxes in the Solution were checked, it's as if you'd written this code at the top of the script:

$boroughs[ ] = "bronx";
$boroughs[ ] = "brooklyn";
$boroughs[ ] = "manhattan";

You can use this to return information from a database that matches multiple records:

foreach ($_GET['boroughs'] as $b) {
  $boroughs[ ] = strtr($dbh->quote($b),array('_' => '\_', '%' => '\%'));
}
$locations = join(',', $boroughs);

$dbh->query("SELECT address FROM locations WHERE borough IN ($locations)");

This syntax also works with multidimensional arrays:

<input type="checkbox" name="population[NY][NYC]" value="8008278">New York...

If checked, this form element sets $population['NY']['NYC'] to 8008278.

Placing a [ ] after a variable's name can cause problems in JavaScript when you try to address your elements. Instead of addressing the element by its name, use the numerical ID. You can also place the element name inside single quotes. Another way is to assign the element an ID, perhaps the name without the [ ], and use that ID instead. Given:

<form>
<input type="checkbox" name="myName[]" value="myValue" id="myName">
</form>

the following three refer to the same form element:

document.forms[0].elements[0];            // using numerical IDs
document.forms[0].elements['myName[ ]'];   // using the name with quotes
document.forms[0].elements['myName'];     // using ID you assigned

See Also

The introduction to Chapter 4 for more on arrays.

Creating Dropdown Menus Based on the Current Date

Problem

You want to create a series of dropdown menus that are based automatically on the current date.

Solution

Use date( ) to find the current time in the web server's time zone and loop through the days with mktime( ).

The following code generates option values for today and the six days that follow. In this case, "today" is January 1, 2002.

list($hour, $minute, $second, $month, $day, $year) = 
                                  split(':', date('h:i:s:m:d:Y'));

// print out one week's worth of days
for ($i = 0; $i < 7; ++$i) {
    $timestamp = mktime($hour, $minute, $second, $month, $day + $i, $year); 
    $date = date("D, F j, Y", $timestamp);
                
    print "<option value=\"$timestamp\">$date</option>\n";
}
<option value="946746000">Tue, January 1, 2002</option>
               <option value="946832400">Wed, January 2, 2002</option>
               <option value="946918800">Thu, January 3, 2002</option>
               <option value="947005200">Fri, January 4, 2002</option>
               <option value="947091600">Sat, January 5, 2002</option>
               <option value="947178000">Sun, January 6, 2002</option>
               <option value="947264400">Mon, January 7, 2002</option>
            

Discussion

In the Solution, we set the value for each date as its Unix timestamp representation because we find this easier to handle inside our programs. Of course, you can use any format you find most useful and appropriate.

Don't be tempted to eliminate the calls to mktime( ); dates and times aren't as consistent as you'd hope. Depending on what you're doing, you might not get the results you want. For example:

$timestamp = mktime(0, 0, 0, 10, 24, 2002); // October 24, 2002
$one_day = 60 * 60 * 24; // number of seconds in a day

// print out one week's worth of days
for ($i = 0; $i < 7; ++$i) {
    $date = date("D, F j, Y", $timestamp);
                
    print "<option value=\"$timestamp\">$date</option>";

    $timestamp += $one_day;
}
<option value="972619200">Fri, October 25, 2002</option>
               <option value="972705600">Sat, October 26, 2002</option>
               <option value="972792000">Sun, October 27, 2002</option>
               <option value="972878400">Sun, October 27, 2002</option>
               <option value="972964800">Mon, October 28, 2002</option>
               <option value="973051200">Tue, October 29, 2002</option>
               <option value="973137600">Wed, October 30, 2002</option>
            

This script should print out the month, day, and year for a seven-day period starting October 24, 2002. However, it doesn't work as expected.

Why are there two "Sun, October 27, 2002"s? The answer: daylight saving time. It's not true that the number of seconds in a day stays constant; in fact, it's almost guaranteed to change. Worst of all, if you're not near either of the change-over dates, you're liable to miss this bug during testing.

See Also

Chapter 3, particularly Recipe 3.13, but also Recipe 3.2, Recipe 3.3, Recipe 3.5, Recipe 3.11, and Recipe 3.14; documentation on date( ) at http://www.php.net/date and mktime( ) at http://www.php.net/mktime.

Personal tools