What's That Noise?! [Ian Kallen's Weblog]

« Tool to defeat a DOS... | Main | Rollin' rollin'... »

20041218 Saturday December 18, 2004

PHP's mbstring is a 2 bit thief Normally, PHP's strlen function reports back the number of bytes in a string. When dealing with western characters, that will equal to the number characters as well. However, strange things can happen when PHP's mbstring extension for multi-byte character support is enabled to alias that function.

If you have mbstring.func_overload configured to alias mb_strlen for strlen (i.e. when the 2 bit is flipped), then strlen starts counting characters, not bytes. If you need to count the number of bytes, it's not obvious how you're supposed to do it.

This is how I did it:
In places where I really needed to know the number of bytes, I used a homebrewed function byte_count instead strlen. Here's the function definition for byte_count.

     function byte_count($val) {  
         $len = (function_exists('mb_strlen')) ? 
             mb_strlen($val, 'latin1') :
             strlen($val);
         return $len;
     }

Perl is hokey about it too. The length is supposed to count the number of characters but if you want to force it to count bytes, you need to use the bytes pragma. From the manpage:

           $x = chr(400);
           print "Length is ", length $x, "\n";     # "Length is 1"
           printf "Contents are %vd\n", $x;         # "Contents are 400"
           {
               use bytes;
               print "Length is ", length $x, "\n"; # "Length is 2"
               printf "Contents are %vd\n", $x;     # "Contents are 198.144"
           }

Java is not without it's pickiness but it as least it has byte and char as distinct primitives.

( Dec 18 2004, 12:50:23 AM PST ) Permalink


Comments:

Post a Comment:

Comments are closed for this entry.