PHP Cookbook/Directories

From WikiContent

Jump to: navigation, search
PHP Cookbook


Contents

Introduction

A filesystem stores a lot of additional information about files aside from their actual contents. This information includes such particulars as the file's size, what directory it's in, and access permissions for the file. If you're working with files, you may also need to manipulate this metadata. PHP gives you a variety of functions to read and manipulate directories, directory entries, and file attributes. Like other file-related parts of PHP, the functions are similar to the C functions that accomplish the same tasks, with some simplifications.

Files are organized with inodes . Each file (and other parts of the filesystem, such as directories, devices, and links) has its own inode. That inode contains a pointer to where the file's data blocks are as well as all the metadata about the file. The data blocks for a directory hold the names of the files in that directory and the inode of each file.

PHP provides two ways to look in a directory to see what files it holds. The first way is to use opendir( ) to get a directory handle, readdir( ) to iterate through the files, and closedir( ) to close the directory handle:

$d = opendir('/usr/local/images') or die($php_errormsg);
while (false !== ($f = readdir($d))) {
    // process file
}
closedir($d);

The second method is to use the directory class. Instantiate the class with dir( ), read each filename with the read( ) method, and close the directory with close( ) :

$d = dir('/usr/local/images') or die($php_errormsg);
while (false !== ($f = $d->read())) {
    // process file
}
$d->close();

Recipe 19.8 shows how to use opendir( ) or dir( ) to process each file in a directory. Making new directories is covered in Recipe 19.11 and removing directories in Recipe 19.12.

The filesystem holds more than just files and directories. On Unix, it can also hold symbolic links. These are special files whose contents are a pointer to another file. You can delete the link without affecting the file it points to. To create a symbolic link, use symlink( ):

symlink('/usr/local/images','/www/docroot/images') or die($php_errormsg);

This creates a symbolic link called images in /www/docroot that points to /usr/local/images.

To find information about a file, directory, or link you must examine its inode. The function stat( ) retrieves the metadata in an inode for you. Recipe 19.3 discusses stat( ). PHP also has many functions that use stat( ) internally to give you a specific piece of information about a file. These are listed in Table 19-1.

Table 19-1. File information functions

Function name What file information does the function provide?
file_exists( ) Does the file exist?
fileatime( ) Last access time
filectime( ) Last metadata change time
filegroup( ) Group (numeric)
fileinode( ) Inode number
filemtime( ) Last change time of contents
fileowner( ) Owner (numeric)
fileperms( ) Permissions (decimal, numeric)
filesize( ) Size
filetype( ) Type (fifo, char, dir, block, link, file, unknown)
is_dir( ) Is it a directory?
is_executable( ) Is it executable?
is_file( ) Is it a regular file?
is_link( ) Is it a symbolic link?
is_readable( ) Is it readable?
is_writable( ) Is it writeable?


On Unix, the file permissions indicate what operations the file's owner, users in the file's group, and all users can perform on the file. The operations are reading, writing, and executing. For programs, executing means the ability to run the program; for directories, it's the ability to search through the directory and see the files in it.

Unix permissions can also contain a setuid bit, a setgid bit, and a sticky bit. The setuid bit means that when a program is run, it runs with the user ID of its owner. The setgid bit means that a program runs with the group ID of its group. For a directory, the setgid bit means that new files in the directory are created by default in the same group as the directory. The sticky bit is useful for directories in which people share files because it prevents nonsuperusers with write permission in a directory from deleting files in that directory unless they own the file or the directory.

When setting permissions with chmod( ) (see Recipe 19.4), they must be expressed as an octal number. This number has four digits. The first digit is any special setting for the file (such as setuid or setgid). The second digit is the user permissions — what the file's owner can do. The third digit is the group permissions — what users in the file's group can do. The fourth digit is the world permissions — what all other users can do. To compute the appropriate value for each digit, add together the permissions you want for that digit using the values in Table 19-2. For example, a permission value of 0644 means that there are no special settings (the 0), the file's owner can read and write the file (the 6, which is 4 (read) + 2 (write)), users in the file's group can read the file (the first 4), and all other users can also read the file (the second 4). A permission value of 4644 is the same, except that the file is also setuid.

Table 19-2. File permission values

Value Permission meaning Special setting meaning
4 Read setuid
2 Write setgid
1 Execute sticky


The permissions of newly created files and directories are affected by a setting called the umask , which is a permission value that is removed, or masked out, from the initial permissions of a file (0666 or directory (0777). For example, if the umask is 0022, the default permissions for a new file created with touch( ) or fopen( ) are 0644 and the default permissions for a new directory created with mkdir( ) are 0755. You can get and set the umask with the function umask( ) . It returns the current umask and, if an argument is supplied to it, changes the umask to the value of that argument. For example, here's how to make the permissions on newly created files prevent anyone but the file's owner (and the superuser) from accessing the file:

$old_umask = umask(0077);
touch('secret-file.txt');
umask($old_umask);

The first call to umask( ) masks out all permissions for group and world. After the file is created, the second call to umask( ) restores the umask to the previous setting. When PHP is run as a server module, it restores the umask to its default value at the end of each request. Like other permissions-related functions, umask( ) doesn't work on Windows.

Getting and Setting File Timestamps

Problem

You want to know when a file was last accessed or changed, or you want to update a file's access or change time; for example, you want each page on your web site to display when it was last modified.

Solution

The fileatime( ) , filemtime( ), and filectime( ) functions return the time of last access, modification, and metadata change of a file:

$last_access = fileatime('larry.php');
$last_modification = filemtime('moe.php');
$last_change = filectime('curly.php');

The touch( ) function changes a file's modification time:

touch('shemp.php');          // set modification time to now
touch('joe.php',$timestamp); // set modification time to $timestamp

Discussion

The fileatime( ) function returns the last time a file was opened for reading or writing. The filemtime( ) function returns the last time a file's contents were changed. The filectime( ) function returns the last time a file's contents or metadata (such as owner or permissions) were changed. Each function returns the time as an epoch timestamp.

A file's modification time can be updated with touch( ). Without a second argument, touch( ) sets the modification time to the current date and time. To set a file's modification time to a specific value, pass that value as an epoch timestamp to touch( ) as a second argument.

This code prints the time a page on your web site was last updated:

print "Last Modified: ".strftime('%c',filemtime($_SERVER['SCRIPT_FILENAME']));

See Also

Documentation on fileatime( ) at http://www.php.net/fileatime, filemtime( ) at http://www.php.net/filemtime, and filectime( ) at http://www.php.net/filectime.

Getting File Information

Problem

You want to read a file's metadata; for example, permissions and ownership.

Solution

Use stat( ) , which returns an array of information about a file:

$info = stat('harpo.php');

Discussion

The function stat( ) returns an array with both numeric and string indexes with information about a file. The elements of this array are in Table 19-3.

Table 19-3. Information returned by stat( )

Numeric index String index Value
0 dev Device
1 ino Inode
2 mode Permissions
3 nlink Link count
4 uid Owner's user ID
5 gid Group's group ID
6 rdev Device type for inode devices (-1 on Windows)
7 size Size (in bytes)
8 atime Last access time (epoch timestamp)
9 mtime Last change time of contents (epoch timestamp)
10 ctime Last change time of contents or metadata (epoch timestamp)
11 blksize Block size for I/O (-1 on Windows)
12 blocks Number of block allocated to this file


The mode element of the returned array contains the permissions expressed as a base 10 integer. This is confusing since permissions are usually either expressed symbolically (e.g., ls's -rw-r--r-- output) or as an octal integer (e.g., 0644). To convert the permissions to a more understandable format, use decoct( ) to change the permissions to octal:

$file_info = stat('/tmp/session.txt');
$permissions = decoct($file_info['mode']);

This results in a six-digit octal number. For example, if ls displays the following about /tmp/session.txt:

-rw-rw-r--    1 sklar    sklar          12 Oct 23 17:55 /tmp/session.txt

Then $file_info['mode'] is 33204 and $permissions is 100664. The last three digits (664) are the user (read and write), group (read and write), and other (read) permissions for the file. The third digit, 0, means that the file is not setuid or setgid. The leftmost 10 means that the file is a regular file (and not a socket, symbolic link, or other special file).

Because stat( ) returns an array with both numeric and string indexes, using foreach to iterate through the returned array produces two copies of each value. Instead, use a for loop from element 0 to element 12 of the returned array.

Calling stat( ) on a symbolic link returns information about the file the symbolic link points to. To get information about the symbolic link itself, use lstat( ) .

Similar to stat( ) is fstat( ) , which takes a file handle (returned from fopen( ) or popen( )) as an argument. You can use fstat( ) only on local files, however, not URLs passed to fopen( ).

PHP's stat( ) function uses the underlying stat(2) system call, which is expensive. To minimize overhead, PHP caches the result of calling stat(2). So, if you call stat( ) on a file, change its permissions, and call stat( ) on the same file again, you get the same results. To force PHP to reload the file's metadata, call clearstatcache( ) , which flushes PHP's cached information. PHP also uses this cache for the other functions that return file metadata: file_exists( ), fileatime( ), filectime( ), filegroup( ), fileinode( ), filemtime( ), fileowner( ), fileperms( ), filesize( ), filetype( ), fstat( ), is_dir( ), is_executable( ), is_file( ), is_link( ), is_readable( ), is_writable( ), and lstat( ).

See Also

Documentation on stat( ) at http://www.php.net/stat, lstat( ) at http://www.php.net/lstat, fstat( ) at http://www.php.net/fstat, and clearstatcache( ) at http://www.php.net/clearstatcache.

Changing File Permissions or Ownership

Problem

You want to change a file's permissions or ownership; for example, you want to prevent other users from being able to look at a file of sensitive data.

Solution

Use chmod( ) to change the permissions of a file:

chmod('/home/user/secrets.txt',0400);

Use chown( ) to change a file's owner and chgrp( ) to change a file's group:

chown('/tmp/myfile.txt','sklar');           // specify user by name
chgrp('/home/sklar/schedule.txt','soccer'); // specify group by name

chown('/tmp/myfile.txt',5001);              // specify user by uid
chgrp('/home/sklar/schedule.txt',102);      // specify group by gid

Discussion

The permissions passed to chmod( ) must be specified as an octal number.

The superuser can change the permissions, owner, and group of any file. Other users are restricted. They can change only the permissions and group of files that they own, and can't change the owner at all. Nonsuperusers can also change only the group of a file to a group they belong to.

The functions chmod( ), chgrp( ), and chown( ) don't work on Windows.

See Also

Documentation on chmod( ) at http://www.php.net/chmod, chown( ) at http://www.php.net/chown, and chgrp( ) at http://www.php.net/chgrp.

Splitting a Filename into Its Component Parts

Problem

You want to find a file's path and filename; for example, you want to create a file in the same directory as an existing file.

Solution

Use basename( ) to get the filename and dirname( ) to get the path:

$full_name = '/usr/local/php/php.ini';
$base = basename($full_name);  // $base is php.ini
$dir  = dirname($full_name);   // $dir is /usr/local/php

Use pathinfo( ) to get the directory name, base name, and extension in an associative array:

$info = pathinfo('/usr/local/php/php.ini');

Discussion

To create a temporary file in the same directory as an existing file, use dirname( ) to find the directory, and pass that directory to tempnam( ) :

$dir = dirname($existing_file);
$temp = tempnam($dir,'temp');
$temp_fh = fopen($temp,'w');

The elements in the associative array returned by pathinfo( ) are dirname, basename, and extension:

$info = pathinfo('/usr/local/php/php.ini');
print_r($info);
Array
               (
                   [dirname] => /usr/local/php
                   [basename] => php.ini
                   [extension] => ini
               )
            

You can also pass basename( ) an optional suffix to remove it from the filename. This sets $base to php:

$base = basename('/usr/local/php/php.ini','.ini');

Using functions such as basename( ), dirname( ), and pathinfo( ) is more portable than just separating a full filename on / because they use an operating-system appropriate separator. On Windows, these functions treat both / and \ as file and directory separators. On other platforms, only / is used.

There's no built-in PHP function to combine the parts produced by basename( ), dirname( ), and pathinfo( ) back into a full filename. To do this you have to combine the parts with . and /:

$dirname = '/usr/local/php';
$basename = 'php';
$extension = 'ini';

$full_name = $dirname . '/' . $basename . '.' . $extension;

You can pass a full filename produced like this to other PHP file functions on Windows, because PHP accepts / as a directory separator on Windows.

See Also

Documentation on basename( ) at http://www.php.net/basename, dirname( ) at http://www.php.net/dirname, and pathinfo( ) at http://www.php.net/pathinfo.

Deleting a File

Problem

You want to delete a file.

Solution

Use unlink( ) :

unlink($file) or die ("can't delete $file: $php_errormsg");

Discussion

The function unlink( ) is only able to delete files that the user of the PHP process is able to delete. If you're having trouble getting unlink( ) to work, check the permissions on the file and how you're running PHP.

See Also

Documentation on unlink( ) at http://www.php.net/unlink.

Copying or Moving a File

Problem

You want to copy or move a file.

Solution

Use copy( ) to copy a file:

copy($old,$new) or die("couldn't copy $old to $new: $php_errormsg");

Use rename( ) to move a file:

rename($old,$new) or die("couldn't move $old to $new: $php_errormsg");

Discussion

On Unix, rename( ) can't move files across filesystems. To do so, copy the file to the new location and then delete the old file:

if (copy("/tmp/code.c","/usr/local/src/code.c")) {
  unlink("/tmp/code.c");
}

If you have multiple files to copy or move, call copy( ) or rename( ) in a loop. You can operate only on one file each time you call these functions.

See Also

Documentation on copy( ) at http://www.php.net/copy and rename( ) at http://www.php.net/rename.

Processing All Files in a Directory Recursively

Problem

You want to iterate over all files in a directory. For example, you want to create a select box in a form that lists all the files in a directory.

Solution

Get a directory handle with opendir( ) and then retrieve each filename with readdir( ):

$d = opendir('/tmp') or die($php_errormsg);
while (false !== ($f = readdir($d))) {
    print "$f\n";
}
closedir($d);

Discussion

The code in the solution tests the return value of readdir( ) with the nonidentity operator (!==) so that the code works properly with filenames that evaluate to false, such as a file named 0.

The function readdir( ) returns each entry in a directory, whether it is a file, directory, or something else (such as a link or a socket). This includes the metaentries "." (current directory) and ".." (parent directory). To just return files, use the is_file( ) function as well:

print '<select name="files">';
$d = opendir('/usr/local/upload') or die($php_errormsg);
while (false !== ($f = readdir($d))) {
    if (is_file("/usr/local/upload/$f")) {
        print '<option> ' . $f . '</option>';
    }
}
closedir($d);
print '</select>';

Because readdir( ) returns only the filename of each directory entry, not a full pathname, you have to prepend the directory name to $f before you pass it to is_file( ).

PHP also has an object-oriented interface to directory information. The dir( ) function returns an object on which you can call read( ), rewind( ), and close( ) methods, which act like the readdir( ), rewinddir( ), and closedir( ) functions. There's also a $path property that contains the full path of the opened directory.

Here's how to iterate through files with the object-oriented interface:

print '<select name="files">';
$d = dir('/usr/local/upload') or die($php_errormsg);
while (false !== ($f = $d->read())) {
    if (is_file($d->path.'/'.$f)) {
        print '<option> ' . $f . '</option>';
    }
}
$d->close();

In this example, $d->path is /usr/local/upload.

See Also

Documentation on opendir( ) at http://www.php.net/opendir, readdir( ) at http://www.php.net/readdir, and the directory class at http://www.php.net/class.dir.

Getting a List of Filenames Matching a Pattern

Problem

You want to find all filenames that match a pattern.

Solution

If your pattern is a regular expression, read each file from the directory and test the name with preg_match( ) :

$d = dir('/tmp') or die($php_errormsg);
while (false !== ($f = $d->read())) {
    // only match alphabetic names
    if (preg_match('/^[a-zA-Z]+$/',$f)) {
        print "$f\n";
    }
}
$d->close();

Discussion

If your pattern is a shell glob (e.g., *.*), use the backtick operator with ls (Unix) or dir (Windows) to get the matching filenames. For Unix:

$files = explode("\n",`ls -1 *.gif`);
foreach ($files as $file) {
  print "$b\n";
}

For Windows:

$files = explode("\n",`dir /b *.gif`);
foreach ($files as $file) {
  print "$b\n";
}

See Also

Recipe 19.8 details on iterating through each file in a directory; information about shell pattern matching is available at http://www.gnu.org/manual/bash/html_node/bashref_35.html.

Processing All Files in a Directory

Problem

You want to do something to all the files in a directory and in any subdirectories.

Solution

Use the pc_process_dir( ) function, shown in Example 19-1, which returns a list of all files in and beneath a given directory.

Example 19-1. pc_process_dir( )

function pc_process_dir($dir_name,$max_depth = 10,$depth = 0) {
    if ($depth >= $max_depth) {
        error_log("Reached max depth $max_depth in $dir_name.");
        return false;
    }
    $subdirectories = array();
    $files = array();
    if (is_dir($dir_name) && is_readable($dir_name)) {
        $d = dir($dir_name);
        while (false !== ($f = $d->read())) {
            // skip . and .. 
            if (('.' == $f) || ('..' == $f)) {
                continue;
            }
            if (is_dir("$dir_name/$f")) {
                array_push($subdirectories,"$dir_name/$f");
            } else {
                array_push($files,"$dir_name/$f");
            }
        }
        $d->close();
        foreach ($subdirectories as $subdirectory) {
            $files = array_merge($files,pc_process_dir($subdirectory,$max_depth,$depth+1));
        }
    } 
    return $files;
}

Discussion

Here's an example: if /tmp contains the files a and b, as well as the directory c, and /tmp/c contains files d and e, pc_process_dir('/tmp') returns an array with elements /tmp/a, /tmp/b, /tmp/c/d, and /tmp/c/e. To perform an operation on each file, iterate through the array:

$files = pc_process_dir('/tmp');
foreach ($files as $file) {
  print "$file was last accessed at ".strftime('%c',fileatime($file))."\n";
}

Instead of returning an array of files, you can also write a function that processes them as it finds them. The pc_process_dir2( ) function, shown in Example 19-2, does this by taking an additional argument, the name of the function to call on each file found.

Example 19-2. pc_process_dir2( )

function pc_process_dir2($dir_name,$func_name,$max_depth = 10,$depth = 0) {
    if ($depth >= $max_depth) {
        error_log("Reached max depth $max_depth in $dir_name.");
        return false;
    }
    $subdirectories = array();
    $files = array();
    if (is_dir($dir_name) && is_readable($dir_name)) {
        $d = dir($dir_name);
        while (false !== ($f = $d->read())) {
            // skip . and ..
            if (('.' == $f) || ('..' == $f)) {
                continue;
            }
            if (is_dir("$dir_name/$f")) {
                array_push($subdirectories,"$dir_name/$f");
            } else {
                $func_name("$dir_name/$f");
            }
        }
        $d->close();
        foreach ($subdirectories as $subdirectory) {
            pc_process_dir2($subdirectory,$func_name,$max_depth,$depth+1);
        }
    } 
}

The pc_process_dir2( ) function doesn't return a list of directories; instead, the function $func_name is called with the file as its argument. Here's how to print out the last access times:

function printatime($file) {
    print "$file was last accessed at ".strftime('%c',fileatime($file))."\n";
}

pc_process_dir2('/tmp','printatime');

Although the two functions produce the same results, the second version uses less memory because potentially large arrays of files aren't passed around.

The pc_process_dir( ) and pc_process_dir2( ) functions use a breadth-first search . In this type of search, the functions handle all the files in the current directory; then they recurse into each subdirectory. In a depth-first search , they recurse into a subdirectory as soon as the subdirectory is found, whether or not there are files remaining in the current directory. The breadth-first search is more memory efficient; each pointer to the current directory is closed (with $d->close( )) before the function recurses into subdirectories, so there's only one directory pointer open at a time.

Because is_dir( ) returns true when passed a symbolic link that points to a directory, both versions of the function follow symbolic links as they traverse down the directory tree. If you don't want to follow links, change the line:

if (is_dir("$dir_name/$f")) {

to:

if (is_dir("$dir_name/$f") && (! is_link("$dir_name/$f"))) {

See Also

Recipe 6.10 for a discussion of variable functions; documentation on is_dir( ) at http://www.php.net/is-dir and is_link( ) at http://www.php.net/is-link.

Making New Directories

Problem

You want to create a directory.

Solution

Use mkdir( ) :

mkdir('/tmp/apples',0777) or die($php_errormsg);

Discussion

The second argument to mkdir( ) is the permission mode for the new directory, which must be an octal number. The current umask is taken away from this permission value to create the permissions for the new directory. So, if the current umask is 0002, calling mkdir('/tmp/apples',0777) sets the permissions on the resulting directory to 0775 (user and group can read, write, and execute; others can only read and execute).

PHP's built-in mkdir( ) can make a directory only if its parent exists. For example, if /tmp/a doesn't exist, you can't create /tmp/a/b until /tmp/a is created. To create a directory and its parents, you have two choices: you can call your system's mkdir program, or you can use the pc_mkdir_parents( ) function, shown in Example 19-3. To use your system's mkdir program, on Unix, use this:

system('/bin/mkdir -p '.escapeshellarg($directory));

On Windows do:

system('mkdir '.escapeshellarg($directory));

You can also use the pc_mkdir_parents( ) function shown in Example 19-3.

Example 19-3. pc_mkdir_parents( )

function pc_mkdir_parents($d,$umask = 0777) {
    $dirs = array($d);
    $d = dirname($d);
    $last_dirname = '';
    while($last_dirname != $d) { 
        array_unshift($dirs,$d);
        $last_dirname = $d;
        $d = dirname($d);
    }

    foreach ($dirs as $dir) {
        if (! file_exists($dir)) {
            if (! mkdir($dir,$umask)) {
                error_log("Can't make directory: $dir");
                return false;
            }
        } elseif (! is_dir($dir)) {
            error_log("$dir is not a directory");
            return false;
        }
    }
    return true;
}

For example:

pc_mkdir_parents('/usr/local/upload/test',0777);

See Also

Documentation on mkdir( ) at http://www.php.net/mkdir; your system's mkdir documentation, such as the Unix mkdir(1) man page or the Windows mkdir /? help text.

Removing a Directory and Its Contents

Problem

You want to remove a directory and all of its contents, including subdirectories and their contents.

Solution

On Unix, use rm:

$directory = escapeshellarg($directory);
exec("rm -rf $directory");

On Windows, use rmdir:

$directory = escapeshellarg($directory);
exec("rmdir /s /q $directory");

Discussion

Removing files, obviously, can be dangerous. Be sure to escape $directory with escapeshellarg( ) so that you don't delete unintended files.

Because PHP's built-in directory removal function, rmdir( ) , works only on empty directories, and unlink( ) can't accept shell wildcards, calling a system program is much easier than recursively looping through all files in a directory, removing them, and then removing each directory. If an external utility isn't available, however, you can modify the pc_process_dir( ) function from Recipe 19.10 to remove each subdirectory.

See Also

Documentation on rmdir( ) at http://www.php.net/rmdir; your system's rm or rmdir documentation, such as the Unix rm(1) manpage or the Windows rmdir /? help text.

Program: Web Server Directory Listing

The web-ls.php program shown in Example 19-4 provides a view of the files inside your web server's document root, formatted like the output of the Unix command ls. Filenames are linked so that you can download each file, and directory names are linked so that you can browse in each directory, as shown in Figure 19-1.

Figure 19-1. Web listing

Web listing

Most lines in Example 19-4 are devoted to building an easy-to-read representation of the file's permissions, but the guts of the program are in the while loop at the end. The $d->read( ) method gets the name of each file in the directory. Then, lstat( ) retrieves information about that file, and printf( ) prints out the formatted information about that file.

The mode_string( ) functions and the constants it uses turn the octal representation of a file's mode (e.g., 35316) into an easier-to-read string (e.g., -rwsrw-r--).

Example 19-4. web-ls.php

/* Bit masks for determining file permissions and type. The names and values
 * listed below are POSIX-compliant, individual systems may have their own 
 * extensions.
 */

define('S_IFMT',0170000);   // mask for all types 
define('S_IFSOCK',0140000); // type: socket 
define('S_IFLNK',0120000);  // type: symbolic link 
define('S_IFREG',0100000);  // type: regular file 
define('S_IFBLK',0060000);  // type: block device 
define('S_IFDIR',0040000);  // type: directory 
define('S_IFCHR',0020000);  // type: character device 
define('S_IFIFO',0010000);  // type: fifo 
define('S_ISUID',0004000);  // set-uid bit 
define('S_ISGID',0002000);  // set-gid bit 
define('S_ISVTX',0001000);  // sticky bit 
define('S_IRWXU',00700);    // mask for owner permissions 
define('S_IRUSR',00400);    // owner: read permission 
define('S_IWUSR',00200);    // owner: write permission 
define('S_IXUSR',00100);    // owner: execute permission 
define('S_IRWXG',00070);    // mask for group permissions 
define('S_IRGRP',00040);    // group: read permission 
define('S_IWGRP',00020);    // group: write permission 
define('S_IXGRP',00010);    // group: execute permission 
define('S_IRWXO',00007);    // mask for others permissions 
define('S_IROTH',00004);    // others: read permission 
define('S_IWOTH',00002);    // others: write permission 
define('S_IXOTH',00001);    // others: execute permission 

/* mode_string() is a helper function that takes an octal mode and returns
 * a ten character string representing the file type and permissions that
 * correspond to the octal mode. This is a PHP version of the mode_string()
 * function in the GNU fileutils package.
 */
function mode_string($mode) {
  $s = array();

 // set type letter 
  if (($mode & S_IFMT) == S_IFBLK) {
    $s[0] = 'b';
  } elseif (($mode & S_IFMT) == S_IFCHR) {
    $s[0] = 'c';
  } elseif (($mode & S_IFMT) == S_IFDIR) {
    $s[0] = 'd';
  } elseif (($mode & S_IFMT) ==  S_IFREG) {
    $s[0] = '-';
  } elseif (($mode & S_IFMT) ==  S_IFIFO) {
    $s[0] = 'p';
  } elseif (($mode & S_IFMT) == S_IFLNK) {
    $s[0] = 'l';
  } elseif (($mode & S_IFMT) == S_IFSOCK) {
    $s[0] = 's';
  }

  // set user permissions 
  $s[1] = $mode & S_IRUSR ? 'r' : '-';
  $s[2] = $mode & S_IWUSR ? 'w' : '-';
  $s[3] = $mode & S_IXUSR ? 'x' : '-';

  // set group permissions 
  $s[4] = $mode & S_IRGRP ? 'r' : '-';
  $s[5] = $mode & S_IWGRP ? 'w' : '-';
  $s[6] = $mode & S_IXGRP ? 'x' : '-';

  // set other permissions 
  $s[7] = $mode & S_IROTH ? 'r' : '-';
  $s[8] = $mode & S_IWOTH ? 'w' : '-';
  $s[9] = $mode & S_IXOTH ? 'x' : '-';

  // adjust execute letters for set-uid, set-gid, and sticky 
  if ($mode & S_ISUID) {
    if ($s[3] != 'x') {
      // set-uid but not executable by owner 
      $s[3] = 'S';
    } else {
      $s[3] = 's';
    }
  }

  if ($mode & S_ISGID) {
    if ($s[6] != 'x') {
      // set-gid but not executable by group 
      $s[6] = 'S';
    } else {
      $s[6] = 's';
    }
  }

  if ($mode & S_ISVTX) {
    if ($s[9] != 'x') {
      // sticky but not executable by others 
      $s[9] = 'T';
    } else {
      $s[9] = 't';
    }
  }

  // return formatted string 
  return join('',$s);

}

// Start at the document root if not specified
if (isset($_REQUEST['dir'])) {
    $dir = $_REQUEST['dir'];
} else {
    $dir = '';
}

// locate $dir in the filesystem
$real_dir = realpath($_SERVER['DOCUMENT_ROOT'].$dir);

// make sure $real_dir is inside document root
if (! preg_match('/^'.preg_quote($_SERVER['DOCUMENT_ROOT'],'/').'/',
                 $real_dir)) {
    die("$dir is not inside the document root");
}

// canonicalize $dir by removing the document root from its beginning 
$dir = substr_replace($real_dir,'',0,strlen($_SERVER['DOCUMENT_ROOT']));

// are we opening a directory?
if (! is_dir($real_dir)) {
    die("$real_dir is not a directory");
}

// open the specified directory 
$d = dir($real_dir) or die("can't open $real_dir: $php_errormsg");

print '<table>';

// read each entry in the directory 
while (false !== ($f = $d->read())) {

    // get information about this file 
    $s = lstat($d->path.'/'.$f);
    
    // translate uid into user name 
    $user_info = posix_getpwuid($s['uid']);

    // translate gid into group name 
    $group_info = posix_getgrgid($s['gid']);

    // format the date for readability 
    $date = strftime('%b %e %H:%M',$s['mtime']);

    // translate the octal mode into a readable string 
    $mode = mode_string($s['mode']);

    $mode_type = substr($mode,0,1);
    if (($mode_type == 'c') || ($mode_type == 'b')) {
        /* if it's a block or character device, print out the major and
         * minor device type instead of the file size */
        $major = ($s['rdev'] >> 8) & 0xff;
        $minor = $s['rdev'] & 0xff;
        $size = sprintf('%3u, %3u',$major,$minor);
    } else {
        $size = $s['size'];
    }

    // format the <a href=""> around the filename
    // no link for the current directory
    if ('.' == $f) {
        $href = $f;
    } else {
        // don't include the ".." in the parent directory link
        if ('..' == $f) {
            $href = urlencode(dirname($dir));
        } else {
            $href = urlencode($dir) . '/' . urlencode($f);
        }
        
        /* everything but "/" should be urlencoded */
        $href = str_replace('%2F','/',$href);

        // browse other directories with web-ls
        if (is_dir(realpath($d->path . '/' . $f))) {
            $href = sprintf('<a href="%s?dir=%s">%s</a>',
                            $_SERVER['PHP_SELF'],$href,$f);
        } else {
            // link to files to download them
            $href= sprintf('<a href="%s">%s</a>',$href,$f);
        }

        // if it's a link, show the link target, too
        if ('l' == $mode_type) {
            $href .= ' -&gt; ' . readlink($d->path.'/'.$f);
        }
    }

    // print out the appropriate info for this file 
    printf('<tr><td>%s</td><td>%3u</td><td align="right">%s</td>
            <td align="right">%s</td><td align="right">%s</td>
            <td align="right">%s</td><td>%s</td></tr>',
           $mode,                // formatted mode string 
           $s['nlink'],          // number of links to this file 
           $user_info['name'],   // owner's user name 
           $group_info['name'],  // group name 
           $size,                // file size (or device numbers) 
           $date,                // last modified date and time 
           $href);               // link to browse or download 
}

print '</table>';

Program: Site Search

You can use site-search.php, shown in Example 19-5, as a search engine for a small-to-medium size, file-based site.

The program looks for a search term (in $_REQUEST['term']) in all files within a specified set of directories under the document root. Those directories are set in $search_dirs. It also recurses into subdirectories and follows symbolic links but keeps track of which files and directories it has seen so that it doesn't get caught in an endless loop.

If any pages are found that contain the search term, it prints list of links to those pages, alphabetically ordered by each page's title. If a page doesn't have a title (between the <title> and </title> tags), the page's relative URI from the document root is used.

The program looks for the search term between the <body> and </body> tags in each file. If you have a lot of text in your pages inside <body> tags that you want to exclude from the search, surround the text that should be searched with specific HTML comments and then modify $body_regex to look for those tags instead. Say, for example, if your page looks like this:

<body>

// Some HTML for menus, headers, etc.

<!-- search-start -->

<h1>Aliens Invade Earth</h1>

<h3>by H.G. Wells</h3>

<p>Aliens invaded earth today. Uh Oh.</p>

// More of the story

<!-- search-end -->

// Some HTML for footers, etc.

</body>

To match the search term against just the title, author, and story inside the HTML comments, change $body_regex to:

$body_regex = '#<!-- search-start -->(.*' . preg_quote($_REQUEST['term'],'#'). 
              '.*)<!-- search-end -->#Sis';

If you don't want the search term to match text that's inside HTML or PHP tags in your pages, add a call to strip_tags( ) to the code that loads the contents of the file for searching:

// load the contents of the file into $file
$file = strip_tags(join('',file($path)));

Example 19-5. site-search.php

function pc_search_dir($dir) { 
    global $body_regex,$title_regex,$seen;

    // array to hold pages that match
    $pages = array();

    // array to hold directories to recurse into
    $dirs = array();

    // mark this directory as seen so we don't look in it again
    $seen[realpath($dir)] = true;
    
    // if we can get a directory handle for this directory
    if (is_readable($dir) && ($d = dir($dir))) {
        // get each file name in the directory
        while (false !== ($f = $d->read())) {
            // build the full path of the file
            $path = $d->path.'/'.$f;
            // if it's a regular file and we can read it
            if (is_file($path) && is_readable($path)) {
                
                $realpath = realpath($path);
                // if we've seen this file already,
                if ($seen[$realpath]) {
                    // then skip it
                    continue;
                } else {
                    // otherwise, mark it as seen so we skip it
                    // if we come to it again
                    $seen[$realpath] = true;
                }

                // load the contents of the file into $file
                $file = join('',file($path));

                // if the search term is inside the body delimiters
                if (preg_match($body_regex,$file)) {

                    // construct the relative URI of the file by removing
                    // the document root from the full path
                    $uri = substr_replace($path,'',0,strlen($_SERVER['DOCUMENT_ROOT']));

                    // If the page has a title, find it
                    if (preg_match('#<title>(.*?)</title>#Sis',$file,$match)) {
                        // and add the title and URI to $pages
                        array_push($pages,array($uri,$match[1]));
                    } else {
                        // otherwise use the URI as the title
                        array_push($pages,array($uri,$uri));
                    }
                }
            } else {
                // if the directory entry is a valid subdirectory
                if (is_dir($path) && ('.' != $f) && ('..' != $f)) {
                    // add it to the list of directories to recurse into
                    array_push($dirs,$path);
                }
            }
        }
        $d->close();
    }

    /* look through each file in each subdirectory of this one, and add
       the matching pages in those directories to $pages. only look in
       a subdirectory if we haven't seen it yet.
    */
    foreach ($dirs as $subdir) {
        $realdir = realpath($subdir);
        if (! $seen[$realdir]) {
            $seen[$realdir] = true;
            $pages = array_merge($pages,pc_search_dir($subdir));
        }
    }

    return $pages;
}

// helper function to sort matched pages alphabetically by title
function pc_page_sort($a,$b) {
    if ($a[1] == $b[1]) {
        return strcmp($a[0],$b[0]);
    } else {
        return ($a[1] > $b[1]);
    }
}

// array to hold the pages that match the search term
$matching_pages = array();
// array to hold pages seen while scanning for the search term
$seen = array();
// directories underneath the document root to search
$search_dirs = array('sports','movies','food');
// regular expression to use in searching files. The "S" pattern
// modifier tells the PCRE engine to "study" the regex for greater
// efficiency.
$body_regex = '#<body>(.*' . preg_quote($_REQUEST['term'],'#'). 
              '.*)</body>#Sis';

// add the files that match in each directory to $matching pages
foreach ($search_dirs as $dir) {
    $matching_pages = array_merge($matching_pages,
                                  pc_search_dir($_SERVER['DOCUMENT_ROOT'].'/'.$dir));
}

if (count($matching_pages)) {
    // sort the matching pages by title
    usort($matching_pages,'pc_page_sort');
    print '<ul>';
    // print out each title with a link to the page
    foreach ($matching_pages as $k => $v) {
        print sprintf('<li> <a href="%s">%s</a>',$v[0],$v[1]);
    }
    print '</ul>';
} else {
    print 'No pages found.';
}


Personal tools