PHP Cookbook/Strings

From WikiContent

< PHP Cookbook
Revision as of 22:28, 6 March 2008 by Docbook2Wiki (Talk)
(diff) ←Older revision | Current revision (diff) | Newer revision→ (diff)
Jump to: navigation, search
PHP Cookbook


Contents

Introduction

Strings in PHP are a sequence of characters, such as "We hold these truths to be self evident," or "Once upon a time," or even "111211211." When you read data from a file or output it to a web browser, your data is represented as strings.

Individual characters in strings can be referenced with array subscript style notation, as in C. The first character in the string is at index 0. For example:

$neighbor = 'Hilda';
print $neighbor[3];
d
         

However, PHP strings differ from C strings in that they are binary-safe (i.e., they can contain null bytes) and can grow and shrink on demand. Their size is limited only by the amount of memory that is available.

You can initialize strings three ways, similar in form and behavior to Perl and the Unix shell: with single quotes, with double quotes, and with the "here document" (heredoc) format. With single-quoted strings, the only special characters you need to escape inside a string are backslash and the single quote itself:

print 'I have gone to the store.';
print 'I\'ve gone to the store.';
print 'Would you pay $1.75 for 8 ounces of tap water?';
print 'In double-quoted strings, newline is represented by \n';
I have gone to the store.
            I've gone to the store.
            Would you pay $1.75 for 8 ounces of tap water?
            In double-quoted strings, newline is represented by \n
         

Because PHP doesn't check for variable interpolation or almost any escape sequences in single-quoted strings, defining strings this way is straightforward and fast.

Double-quoted strings don't recognize escaped single quotes, but they do recognize interpolated variables and the escape sequences shown in Table 1-1.

Table 1-1. Double-quoted string escape sequences

Escape sequence Character
\n Newline (ASCII 10)
\r Carriage return (ASCII 13)
\t Tab (ASCII 9)
\\ Backslash
\$ Dollar sign
\ " Double quotes
\{ Left brace
\} Right brace
\[ Left bracket
\] Right bracket
\0 through \777 Octal value
\x0 through \xFF Hex value


For example:

print "I've gone to the store.";
print "The sauce cost \$10.25.";
$cost = '$10.25';
print "The sauce cost $cost.";
print "The sauce cost \$\061\060.\x32\x35.";
I've gone to the store.
            The sauce cost $10.25.
            The sauce cost $10.25.
            The sauce cost $10.25.
         

The last line of code prints the price of sauce correctly because the character 1 is ASCII code 49 decimal and 061 octal. Character 0 is ASCII 48 decimal and 060 octal; 2 is ASCII 50 decimal and 32 hex; and 5 is ASCII 53 decimal and 35 hex.

Heredoc-specified strings recognize all the interpolations and escapes of double- quoted strings, but they don't require double quotes to be escaped. Heredocs start with <<< and a token. That token (with no leading or trailing whitespace), followed by a semicolon to end the statement (if necessary), ends the heredoc. For example:

print <<< END
It's funny when signs say things like:
   Original "Root" Beer
   "Free" Gift
   Shoes cleaned while "you" wait
or have other misquoted words.
END;
It's funny when signs say things like:
            Original "Root" Beer
               "Free" Gift
               Shoes cleaned while "you" wait
            or have other misquoted words.
         

With heredocs, newlines, spacing, and quotes are all preserved. The end-of-string identifier is usually all caps, by convention, and it is case sensitive. Thus, this is okay:

print <<< PARSLEY
It's easy to grow fresh:
Parsley
Chives
on your windowsill
PARSLEY;

So is this:

print <<< DOGS
If you like pets, yell out:
DOGS AND CATS ARE GREAT!
DOGS;

Heredocs are useful for printing out HTML with interpolated variables:

if ($remaining_cards > 0) {
    $url = '/deal.php';
    $text = 'Deal More Cards';
} else {
    $url = '/new-game.php';
    $text = 'Start a New Game';
}
print <<< HTML
There are <b>$remaining_cards</b> left.
<p>
<a href="$url">$text</a>
HTML;

In this case, the semicolon needs to go after the end-of-string delimiter, to tell PHP the statement is ended. In some cases, however, you shouldn't use the semicolon:

$a = <<< END
Once upon a time, there was a
END
. ' boy!';
print $a;
Once upon a time, there was a boy!
         

In this case, the expression needs to continue on the next line, so you don't use a semicolon. Note also that in order for PHP to recognize the end-of-string delimiter, the . string concatenation operator needs to go on a separate line from the end-of-string delimiter.

Accessing Substrings

You want to extract part of a string, starting at a particular place in the string. For example, you want the first eight characters of a username entered into a form.

Solution

Use substr( ) to select your substrings:

$substring = substr($string,$start,$length);
$username = substr($_REQUEST['username'],0,8);

Discussion

If $start and $length are positive, substr( ) returns $length characters in the string, starting at $start. The first character in the string is at position 0:

print substr('watch out for that tree',6,5);
out f
            

If you leave out $length, substr( ) returns the string from $start to the end of the original string:

print substr('watch out for that tree',17);
t tree
            

If $start plus $length goes past the end of the string, substr( ) returns all of the string from $start forward:

print substr('watch out for that tree',20,5);
ree
            

If $start is negative, substr( ) counts back from the end of the string to determine where your substring starts:

print substr('watch out for that tree',-6);
print substr('watch out for that tree',-17,5);
t tree
               out f
            

If $length is negative, substr( ) counts back from the end of the string to determine where your substring ends:

print substr('watch out for that tree',15,-2);
print substr('watch out for that tree',-4,-1);
hat tr
               tre
            

See Also

Documentation on substr( ) at http://www.php.net/substr.

Replacing Substrings

Problem

You want to replace a substring with a different string. For example, you want to obscure all but the last four digits of a credit card number before printing it.

Solution

Use substr_replace( ) :

// Everything from position $start to the end of $old_string
// becomes $new_substring
$new_string = substr_replace($old_string,$new_substring,$start);

// $length characters, starting at position $start, become $new_substring
$new_string = substr_replace($old_string,$new_substring,$start,$length);

Discussion

Without the $length argument, substr_replace( ) replaces everything from $start to the end of the string. If $length is specified, only that many characters are replaced:

print substr_replace('My pet is a blue dog.','fish.',12);
print substr_replace('My pet is a blue dog.','green',12,4);
$credit_card = '4111 1111 1111 1111';
print substr_replace($credit_card,'xxxx ',0,strlen($credit_card)-4);
My pet is a fish.
               My pet is a green dog.
               xxxx 1111
            

If $start is negative, the new substring is placed at $start characters counting from the end of $old_string, not from the beginning:

print substr_replace('My pet is a blue dog.','fish.',-9);
print substr_replace('My pet is a blue dog.','green',-9,4);
My pet is a fish.
               My pet is a green dog.
            

If $start and $length are 0, the new substring is inserted at the start of $old_string:

print substr_replace('My pet is a blue dog.','Title: ',0,0);
Title: My pet is a blue dog.

The function substr_replace( ) is useful when you've got text that's too big to display all at once, and you want to display some of the text with a link to the rest. For example, this displays the first 25 characters of a message with an ellipsis after it as a link to a page that displays more text:

$r = mysql_query("SELECT id,message FROM messages WHERE id = $id") or die( );
$ob = mysql_fetch_object($r);
printf('<a href="more-text.php?id=%d">%s</a>',
       $ob->id, substr_replace($ob->message,' ...',25));

The more-text.php page can use the message ID passed in the query string to retrieve the full message and display it.

See Also

Documentation on substr_replace( ) at http://www.php.net/substr-replace.

Processing a String One Character at a Time

Problem

You need to process each character in a string individually.

Solution

Loop through each character in the string with for. This example counts the vowels in a string:

$string = "This weekend, I'm going shopping for a pet chicken.";
$vowels = 0;
for ($i = 0, $j = strlen($string); $i < $j; $i++) {
    if (strstr('aeiouAEIOU',$string[$i])) {
        $vowels++;
    }
}

Discussion

Processing a string a character at a time is an easy way to calculate the "Look and Say" sequence:

function lookandsay($s) {
    // initialize the return value to the empty string
    $r = '';
    /* $m holds the character we're counting, initialize to the first
    // character in the string*/
    $m = $s[0];
    // $n is the number of $m's we've seen, initialize to 1
    $n = 1;
    for ($i = 1, $j = strlen($s); $i < $j; $i++) {
        // if this character is the same as the last one
        if ($s[$i] == $m) {
            // increment the count of this character
            $n++;
        } else {
            // otherwise, add the count and character to the return value 
            $r .= $n.$m;
            // set the character we're looking for to the current one 
            $m = $s[$i];
            // and reset the count to 1
            $n = 1;
        }
    }
    // return the built up string as well as the last count and character
    return $r.$n.$m;
}

for ($i = 0, $s = 1; $i < 10; $i++) {
    $s = lookandsay($s);
    print "$s\n";
}
1
               11
               21
               1211
               111221
               312211
               13112221
               1113213211
               31131211131221
               13211311123113112211
            

It's called the "Look and Say" sequence because each element is what you get by looking at the previous element and saying what's in it. For example, looking at the first element, 1, you say "one one." So the second element is "11." That's two ones, so the third element is "21." Similarly, that's one two and one one, so the fourth element is "1211," and so on.

See Also

Documentation on for at http://www.php.net/for; more about the "Look and Say" sequence at http://mathworld.wolfram.com/LookandSaySequence.html.

Reversing a String by Word or Character

Problem

You want to reverse the words or the characters in a string.

Solution

Use strrev( ) to reverse by character:

print strrev('This is not a palindrome.');
.emordnilap a ton si sihT
            

To reverse by words, explode the string by word boundary, reverse the words, then rejoin:

$s = "Once upon a time there was a turtle.";
// break the string up into words
$words = explode(' ',$s);
// reverse the array of words
$words = array_reverse($words);
// rebuild the string
$s = join(' ',$words);
print $s;
turtle. a was there time a upon Once
            

Discussion

Reversing a string by words can also be done all in one line:

$reversed_s = join(' ',array_reverse(explode(' ',$s)));

See Also

Recipe 18.8 discusses the implications of using something other than a space character as your word boundary; documentation on strrev( ) at http://www.php.net/strrev and array_reverse( ) at http://www.php.net/array-reverse.

Expanding and Compressing Tabs

Problem

You want to change spaces to tabs (or tabs to spaces) in a string while keeping text aligned with tab stops. For example, you want to display formatted text to users in a standardized way.

Solution

Use str_replace( ) to switch spaces to tabs or tabs to spaces:

$r = mysql_query("SELECT message FROM messages WHERE id = 1") or die();
$ob = mysql_fetch_object($r);
$tabbed = str_replace(' ',"\t",$ob->message);
$spaced = str_replace("\t",' ',$ob->message);

print "With Tabs: <pre>$tabbed</pre>";
print "With Spaces: <pre>$spaced</pre>";

Using str_replace( ) for conversion, however, doesn't respect tab stops. If you want tab stops every eight characters, a line beginning with a five-letter word and a tab should have that tab replaced with three spaces, not one. Use the pc_tab_expand( ) function shown in Example 1-1 to turn tabs to spaces in a way that respects tab stops.

Example 1-1. pc_tab_expand( )

function pc_tab_expand($a) {
  $tab_stop = 8;
  while (strstr($a,"\t")) {
    $a = preg_replace('/^([^\t]*)(\t+)/e',
                      "'\\1'.str_repeat(' ',strlen('\\2') * 
                       $tab_stop - strlen('\\1') % $tab_stop)",$a);
  } 
  return $a;
}

$spaced = pc_tab_expand($ob->message);

You can use the pc_tab_unexpand() function shown in Example 1-2 to turn spaces back to tabs.

Example 1-2. pc_tab_unexpand( )

function pc_tab_unexpand($x) {
  $tab_stop = 8;
  
  $lines = explode("\n",$x);
  for ($i = 0, $j = count($lines); $i < $j; $i++) {
    $lines[$i] = pc_tab_expand($lines[$i]);
    $e = preg_split("/(.\{$tab_stop})/",$lines[$i],-1,PREG_SPLIT_DELIM_CAPTURE);
    $lastbit = array_pop($e);
    if (!isset($lastbit)) { $lastbit = ''; }
    if ($lastbit == str_repeat(' ',$tab_stop)) { $lastbit = "\t"; }
    for ($m = 0, $n = count($e); $m < $n; $m++) {
      $e[$m] = preg_replace('/  +$',"\t",$e[$m]);
    }
    $lines[$i] = join('',$e).$lastbit;
  }
  $x = join("\n", $lines);
  return $x;
}

$tabbed = pc_tab_unexpand($ob->message);

Both functions take a string as an argument and return the string appropriately modified.

Discussion

Each function assumes tab stops are every eight spaces, but that can be modified by changing the setting of the $tab_stop variable.

The regular expression in pc_tab_expand( ) matches both a group of tabs and all the text in a line before that group of tabs. It needs to match the text before the tabs because the length of that text affects how many spaces the tabs should be replaced so that subsequent text is aligned with the next tab stop. The function doesn't just replace each tab with eight spaces; it adjusts text after tabs to line up with tab stops.

Similarly, pc_tab_unexpand( ) doesn't just look for eight consecutive spaces and then replace them with one tab character. It divides up each line into eight-character chunks and then substitutes ending whitespace in those chunks (at least two spaces) with tabs. This not only preserves text alignment with tab stops; it also saves space in the string.

See Also

Documentation on str_replace( ) at http://www.php.net/str-replace.

Controlling Case

Problem

You need to capitalize, lowercase, or otherwise modify the case of letters in a string. For example, you want to capitalize the initial letters of names but lowercase the rest.

Solution

Use ucfirst( ) or ucwords( ) to capitalize the first letter of one or more words:

print ucfirst("how do you do today?");
print ucwords("the prince of wales");
How do you do today?
               The Prince Of Wales
            

Use strtolower( ) or strtoupper( ) to modify the case of entire strings:

print strtoupper("i'm not yelling!");
// Tags must be lowercase to be XHTML compliant
print strtolower('<A HREF="one.php">one</A>');
I'M NOT YELLING!
               <a href="one.php">one</a>
            

Discussion

Use ucfirst( ) to capitalize the first character in a string:

print ucfirst('monkey face');
print ucfirst('1 monkey face');
Monkey face
               1 monkey face
            

Note that the second line of output is not "1 Monkey face".

Use ucwords( ) to capitalize the first character of each word in a string:

print ucwords('1 monkey face');
print ucwords("don't play zone defense against the philadelphia 76-ers");
1 Monkey Face
               Don't Play Zone Defense Against The Philadelphia 76-ers
            

As expected, ucwords( ) doesn't capitalize the "t" in "don't." But it also doesn't capitalize the "e" in "76-ers." For ucwords( ), a word is any sequence of nonwhitespace characters that follows one or more whitespace characters. Since both ' and - aren't whitespace characters, ucwords( ) doesn't consider the "t" in "don't" or the "e" in "76-ers" to be word-starting characters.

Both ucfirst( ) and ucwords( ) don't change the case of nonfirst letters:

print ucfirst('macWorld says I should get a iBook');
print ucwords('eTunaFish.com might buy itunaFish.Com!');
MacWorld says I should get a iBook
               ETunaFish.com Might Buy ItunaFish.Com!
            

The functions strtolower( ) and strtoupper( ) work on entire strings, not just individual characters. All alphabetic characters are changed to lowercase by strtolower( ) and strtoupper( ) changes all alphabetic characters to uppercase:

print strtolower("I programmed the WOPR and the TRS-80.");
print strtoupper('"since feeling is first" is a poem by e. e. cummings.');
i programmed the wopr and the trs-80.
               "SINCE FEELING IS FIRST" IS A POEM BY E. E. CUMMINGS.
            

When determining upper- and lowercase, these functions respect your locale settings.

See Also

For more information about locale settings, see Chapter 16; documentation on ucfirst( ) at http://www.php.net/ucfirst, ucwords( ) at http://www.php.net/ucwords, strtolower( ) at http://www.php.net/strtolower, and strtoupper( ) at http://www.php.net/strtoupper.

Interpolating Functions and Expressions Within Strings

Problem

You want to include the results of executing a function or expression within a string.

Solution

Use the string concatenation operator (.) when the value you want to include can't be inside the string:

print 'You have '.($_REQUEST['boys'] + $_REQUEST['girls']).' children.';
print "The word '$word' is ".strlen($word).' characters long.';
print 'You owe '.$amounts['payment'].' immediately';
print "My circle's diameter is ".$circle->getDiameter().' inches.';

Discussion

You can put variables, object properties, and array elements (if the subscript is unquoted) directly in double-quoted strings:

print "I have $children children.";
print "You owe $amounts[payment] immediately.";
print "My circle's diameter is $circle->diameter inches.";

Direct interpolation or using string concatenation also works with heredocs. Interpolating with string concatenation in heredocs can look a little strange because the heredoc delimiter and the string concatenation operator have to be on separate lines:

print <<< END
Right now, the time is 
END
. strftime('%c') . <<< END
 but tomorrow it will be 
END
. strftime('%c',time() + 86400);

Also, if you're interpolating with heredocs, make sure to include appropriate spacing for the whole string to appear properly. In the previous example, "Right now the time" has to include a trailing space, and "but tomorrow it will be" has to include leading and trailing spaces.

See Also

For the syntax to interpolate variable variables (like ${"amount_$i"}), see Recipe 5.5; documentation on the string concatenation operator at http://www.php.net/language.operators.string.

Trimming Blanks from a String

Problem

You want to remove whitespace from the beginning or end of a string. For example, you want to clean up user input before validating it.

Solution

Use ltrim( ) , rtrim( ), or trim( ). ltrim( ) removes whitespace from the beginning of a string, rtrim( ) from the end of a string, and trim( ) from both the beginning and end of a string:

$zipcode = trim($_REQUEST['zipcode']);
$no_linefeed = rtrim($_REQUEST['text']);
$name = ltrim($_REQUEST['name']);

Discussion

For these functions, whitespace is defined as the following characters: newline, carriage return, space, horizontal and vertical tab, and null.

Trimming whitespace off of strings saves storage space and can make for more precise display of formatted data or text within <pre> tags, for example. If you are doing comparisons with user input, you should trim the data first, so that someone who mistakenly enters "98052 " as their Zip Code isn't forced to fix an error that really isn't. Trimming before exact text comparisons also ensures that, for example, "salami\n" equals "salami." It's also a good idea to normalize string data by trimming it before storing it in a database.

The trim( ) functions can also remove user-specified characters from strings. Pass the characters you want to remove as a second argument. You can indicate a range of characters with two dots between the first and last characters in the range.

// Remove numerals and space from the beginning of the line
print ltrim('10 PRINT A$',' 0..9');
// Remove semicolon from the end of the line
print rtrim('SELECT * FROM turtles;',';');
PRINT A$
               SELECT * FROM turtles
            

PHP also provides chop( ) as an alias for rtrim( ). However, you're best off using rtrim( ) instead, because PHP's chop( ) behaves differently than Perl's chop( ) (which is deprecated in favor of chomp( ), anyway) and using it can confuse others when they read your code.

See Also

Documentation on trim( ) at http://www.php.net/trim, ltrim( ) at http://www.php.net/ltrim, and rtrim( ) at http://www.php.net/rtrim.

Parsing Comma-Separated Data

Problem

You have data in comma-separated values ( CSV) format, for example a file exported from Excel or a database, and you want to extract the records and fields into a format you can manipulate in PHP.

Solution

If the CSV data is in a file (or available via a URL), open the file with fopen( ) and read in the data with fgetcsv( ). This prints out the data in an HTML table:

$fp = fopen('sample2.csv','r') or die("can't open file");
print "<table>\n";
while($csv_line = fgetcsv($fp,1024)) {
    print '<tr>';
    for ($i = 0, $j = count($csv_line); $i < $j; $i++) {
        print '<td>'.$csv_line[$i].'</td>';
    }
    print "</tr>\n";
}
print '</table>\n';
fclose($fp) or die("can't close file");

Discussion

The second argument to fgetcsv( ) must be longer than the maximum length of a line in your CSV file. (Don't forget to count the end-of-line whitespace.) If you read in CSV lines longer than 1K, change the 1024 used in this recipe to something that accommodates your line length.

You can pass fgetcsv( ) an optional third argument, a delimiter to use instead of a comma (,). Using a different delimiter however, somewhat defeats the purpose of CSV as an easy way to exchange tabular data.

Don't be tempted to bypass fgetcsv( ) and just read a line in and explode( ) on the commas. CSV is more complicated than that, in order to deal with embedded commas and double quotes. Using fgetcsv( ) protects you and your code from subtle errors.

See Also

Documentation on fgetcsv( ) at http://www.php.net/fgetcsv.

Parsing Fixed-Width Delimited Data

Problem

You need to break apart fixed-width records in strings.

Solution

Use substr( ) :

$fp = fopen('fixed-width-records.txt','r') or die ("can't open file");
while ($s = fgets($fp,1024)) {
    $fields[1] = substr($s,0,10);  // first field:  first 10 characters of the line
    $fields[2] = substr($s,10,5);  // second field: next 5 characters of the line
    $fields[3] = substr($s,15,12); // third field:  next 12 characters of the line
    // a function to do something with the fields
    process_fields($fields);
}
fclose($fp) or die("can't close file");

Or unpack( ) :

$fp = fopen('fixed-width-records.txt','r') or die ("can't open file");
while ($s = fgets($fp,1024)) {
    // an associative array with keys "title", "author", and "publication_year"
    $fields = unpack('A25title/A14author/A4publication_year',$s);
    // a function to do something with the fields
    process_fields($fields);
}
fclose($fp) or die("can't close file");

Discussion

Data in which each field is allotted a fixed number of characters per line may look like this list of books, titles, and publication dates:

$booklist=<<<END
Elmer Gantry             Sinclair Lewis1927
The Scarlatti InheritanceRobert Ludlum 1971
The Parsifal Mosaic      Robert Ludlum 1982
Sophie's Choice          William Styron1979
END;

In each line, the title occupies the first 25 characters, the author's name the next 14 characters, and the publication year the next 4 characters. Knowing those field widths, it's straightforward to use substr( ) to parse the fields into an array:

$books = explode("\n",$booklist);

for($i = 0, $j = count($books); $i < $j; $i++) {
  $book_array[$i]['title'] = substr($books[$i],0,25);
  $book_array[$i]['author'] = substr($books[$i],25,14);
  $book_array[$i]['publication_year'] = substr($books[$i],39,4);
}

Exploding $booklist into an array of lines makes the looping code the same whether it's operating over a string or a series of lines read in from a file.

The loop can be made more flexible by specifying the field names and widths in a separate array that can be passed to a parsing function, as shown in the pc_fixed_width_substr( ) function in Example 1-3.

Example 1-3. pc_fixed_width_substr( )

function pc_fixed_width_substr($fields,$data) {
  $r = array();
  for ($i = 0, $j = count($data); $i < $j; $i++) {
    $line_pos = 0;
    foreach($fields as $field_name => $field_length) {
      $r[$i][$field_name] = rtrim(substr($data[$i],$line_pos,$field_length));
      $line_pos += $field_length;
    }
  }
  return $r;
}

$book_fields = array('title' => 25,
                     'author' => 14,
                     'publication_year' => 4);

$book_array = pc_fixed_width_substr($book_fields,$books);

The variable $line_pos keeps track of the start of each field, and is advanced by the previous field's width as the code moves through each line. Use rtrim( ) to remove trailing whitespace from each field.

You can use unpack( ) as a substitute for substr( ) to extract fields. Instead of specifying the field names and widths as an associative array, create a format string for unpack( ). A fixed-width field extractor using unpack( ) looks like the pc_fixed_width_unpack( ) function shown in Example 1-4.

Example 1-4. pc_fixed_width_unpack( )

function pc_fixed_width_unpack($format_string,$data) {
  $r = array();
  for ($i = 0, $j = count($data); $i < $j; $i++) {
    $r[$i] = unpack($format_string,$data[$i]);
  }
  return $r;
}

$book_array = pc_fixed_width_unpack('A25title/A14author/A4publication_year',
                                    $books);

Because the A format to unpack( ) means "space padded string," there's no need to rtrim( ) off the trailing spaces.

Once the fields have been parsed into $book_array by either function, the data can be printed as an HTML table, for example:

$book_array = pc_fixed_width_unpack('A25title/A14author/A4publication_year',
                                    $books);
print "<table>\n";
// print a header row
print '<tr><td>';
print join('</td><td>',array_keys($book_array[0]));
print "</td></tr>\n";
// print each data row
foreach ($book_array as $row) {
    print '<tr><td>';
    print join('</td><td>',array_values($row));
    print "</td></tr>\n";
}
print '</table>\n';

Joining data on </td><td> produces a table row that is missing its first <td> and last </td>. We produce a complete table row by printing out <tr><td> before the joined data and </td></tr> after the joined data.

Both substr( ) and unpack( ) have equivalent capabilities when the fixed-width fields are strings, but unpack( ) is the better solution when the elements of the fields aren't just strings.

See Also

For more information about unpack( ), see Recipe 1.14 and http://www.php.net/unpack; Recipe 4.9 discusses join( ).

Taking Strings Apart

Problem

You need to break a string into pieces. For example, you want to access each line that a user enters in a <textarea> form field.

Solution

Use explode( ) if what separates the pieces is a constant string:

$words = explode(' ','My sentence is not very complicated');

Use split( ) or preg_split( ) if you need a POSIX or Perl regular expression to describe the separator:

$words = split(' +','This sentence  has  some extra whitespace  in it.');
$words = preg_split('/\d\. /','my day: 1. get up 2. get dressed 3. eat toast');
$lines = preg_split('/[\n\r]+/',$_REQUEST['textarea']);

Use spliti( ) or the /i flag to preg_split( ) for case-insensitive separator matching:

$words = spliti(' x ','31 inches x 22 inches X 9 inches');
$words = preg_split('/ x /i','31 inches x 22 inches X 9 inches');

Discussion

The simplest solution of the bunch is explode( ). Pass it your separator string, the string to be separated, and an optional limit on how many elements should be returned:

$dwarves = 'dopey,sleepy,happy,grumpy,sneezy,bashful,doc';
$dwarf_array = explode(',',$dwarves);

Now $dwarf_array is a seven element array:

print_r($dwarf_array);
Array
               (
                   [0] => dopey
                   [1] => sleepy
                   [2] => happy
                   [3] => grumpy
                   [4] => sneezy
                   [5] => bashful
                   [6] => doc
               )
            

If the specified limit is less than the number of possible chunks, the last chunk contains the remainder:

$dwarf_array = explode(',',$dwarves,5);
print_r($dwarf_array);
Array
               (
                   [0] => dopey
                   [1] => sleepy
                   [2] => happy
                   [3] => grumpy
                   [4] => sneezy,bashful,doc
               )
            

The separator is treated literally by explode( ). If you specify a comma and a space as a separator, it breaks the string only on a comma followed by a space — not on a comma or a space.

With split( ), you have more flexibility. Instead of a string literal as a separator, it uses a POSIX regular expression:

$more_dwarves = 'cheeky,fatso, wonder boy, chunky,growly, groggy, winky';
$more_dwarf_array = split(', ?',$more_dwarves);

This regular expression splits on a comma followed by an optional space, which treats all the new dwarves properly. Those with a space in their name aren't broken up, but everyone is broken apart whether they are separated by "," or ", ":

print_r($more_dwarf_array);
Array
               (
                   [0] => cheeky
                   [1] => fatso
                   [2] => wonder boy
                   [3] => chunky
                   [4] => growly
                   [5] => groggy
                   [6] => winky
               )
            

Similar to split( ) is preg_split( ) , which uses a Perl-compatible regular-expression engine instead of a POSIX regular-expression engine. With preg_split( ), you can take advantage of various Perlish regular-expression extensions, as well as tricks such as including the separator text in the returned array of strings:

$math = "3 + 2 / 7 - 9";
$stack = preg_split('/ *([+\-\/*]) */',$math,-1,PREG_SPLIT_DELIM_CAPTURE);
print_r($stack);
Array
               (
                   [0] => 3
                   [1] => +
                   [2] => 2
                   [3] => /
                   [4] => 7
                   [5] => -
                   [6] => 9
               )
            

The separator regular expression looks for the four mathematical operators (+, -, /, *), surrounded by optional leading or trailing spaces. The PREG_SPLIT_DELIM_CAPTURE flag tells preg_split( ) to include the matches as part of the separator regular expression in parentheses in the returned array of strings. Only the mathematical operator character class is in parentheses, so the returned array doesn't have any spaces in it.

See Also

Regular expressions are discussed in more detail in Chapter 13; documentation on explode( ) at http://www.php.net/explode, split( ) at http://www.php.net/split, and preg_split( ) at http://www.php.net/preg-split.

Wrapping Text at a Certain Line Length

Problem

You need to wrap lines in a string. For example, you want to display text in <pre>/</pre> tags but have it stay within a regularly sized browser window.

Solution

Use wordwrap( ) :

$s = "Four score and seven years ago our fathers brought forth on this continen
t a new nation, conceived in liberty and dedicated to the proposition that all 
men are created equal.";

print "<pre>\n".wordwrap($s)."\n</pre>";
<pre>
               Four score and seven years ago our fathers brought forth on this continent
               a new nation, conceived in liberty and dedicated to the proposition that
               all men are created equal.
               </pre>
            

Discussion

By default, wordwrap( ) wraps text at 75 characters per line. An optional second argument specifies different line length:

print wordwrap($s,50);
Four score and seven years ago our fathers brought
               forth on this continent a new nation, conceived in
               liberty and dedicated to the proposition that all
               men are created equal.
            

Other characters besides "\n" can be used for linebreaks. For double spacing, use "\n\n":

print wordwrap($s,50,"\n\n");
Four score and seven years ago our fathers brought

               forth on this continent a new nation, conceived in

               liberty and dedicated to the proposition that all

               men are created equal.
            

There is an optional fourth argument to wordwrap( ) that controls the treatment of words that are longer than the specified line length. If this argument is 1, these words are wrapped. Otherwise, they span past the specified line length:

print wordwrap('jabberwocky',5);
print wordwrap('jabberwocky',5,"\n",1);
jabberwocky

               jabbe
               rwock
               y
            

See Also

Documentation on wordwrap( ) at http://www.php.net/wordwrap.

Storing Binary Data in Strings

Problem

You want to parse a string that contains values encoded as a binary structure or encode values into a string. For example, you want to store numbers in their binary representation instead of as sequences of ASCII characters.

Solution

Use pack( ) to store binary data in a string:

$packed = pack('S4',1974,106,28225,32725);

Use unpack( ) to extract binary data from a string:

$nums = unpack('S4',$packed);

Discussion

The first argument to pack( ) is a format string that describes how to encode the data that's passed in the rest of the arguments. The format string S4 tells pack( ) to produce four unsigned short 16-bit numbers in machine byte order from its input data. Given 1974, 106, 28225, and 32725 as input, this returns eight bytes: 182, 7, 106, 0, 65, 110, 213, and 127. Each two-byte pair corresponds to one of the input numbers: 7 * 256 + 182 is 1974; 0 * 256 + 106 is 106; 110 * 256 + 65 = 28225; 127 * 256 + 213 = 32725.

The first argument to unpack( ) is also a format string, and the second argument is the data to decode. Passing a format string of S4, the eight-byte sequence that pack( ) produced returns a four-element array of the original numbers:

print_r($nums);
Array
               (
                   [1] => 1974
                   [2] => 106
                   [3] => 28225
                   [4] => 32725
               )
            

In unpack( ), format characters and their count can be followed by a string to be used as an array key. For example:

$nums = unpack('S4num',$packed);
print_r($nums);
Array
               (
                   [num1] => 1974
                   [num2] => 106
                   [num3] => 28225
                   [num4] => 32725
               )
            

Multiple format characters must be separated with / in unpack( ):

$nums = unpack('S1a/S1b/S1c/S1d',$packed);
print_r($nums);
Array
               (
                   [a] => 1974
                   [b] => 106
                   [c] => 28225
                   [d] => 32725
               )
            

The format characters that can be used with pack( ) and unpack( ) are listed in Table 1-2.

Table 1-2. Format characters for pack( ) and unpack( )

Format character Data type
a NUL-padded string
A Space-padded string
h Hex string, low nibble first
H Hex string, high nibble first
c signed char
C unsigned char
s signed short (16 bit, machine byte order)
S unsigned short (16 bit, machine byte order)
n unsigned short (16 bit, big endian byte order)
v unsigned short (16 bit, little endian byte order)
i signed int (machine-dependent size and byte order)
I unsigned int (machine-dependent size and byte order)
l signed long (32 bit, machine byte order)
L unsigned long (32 bit, machine byte order)
N unsigned long (32 bit, big endian byte order)
V unsigned long (32 bit, little endian byte order)
f float (machine dependent size and representation)
d double (machine dependent size and representation)
x NUL byte
X Back up one byte
@ NUL-fill to absolute position


For a, A, h, and H, a number after the format character indicates how long the string is. For example, A25 means a 25-character space-padded string. For other format characters, a following number means how many of that type appear consecutively in a string. Use * to take the rest of the available data.

You can convert between data types with unpack( ). This example fills the array $ascii with the ASCII values of each character in $s:

$s = 'platypus';
$ascii = unpack('c*',$s);
print_r($ascii);
Array
               (
                   [1] => 112
                   [2] => 108
                   [3] => 97
                   [4] => 116
                   [5] => 121
                   [6] => 112
                   [7] => 117
                   [8] => 115
               )
            

See Also

Documentation on pack( ) at http://www.php.net/pack and unpack( ) at http://www.php.net/unpack .

Personal tools