Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unexpected behavior of PHP unpack()

Tags:

php

unpack

Tests

$x = sprintf( "foo\x00bar\x00baz" );
$y = unpack( 'afoo/abar/abaz' , $x );
print_r( $y );
$x = sprintf( "foo\x00bar\x00baz" );
$y = unpack( 'a*foo/a*bar/a*baz' , $x );
print_r( $y );

Results

Array
(
    [foo] => f
    [bar] => o
    [baz] => o
)
Array
(
    [foo] => foobarbaz
    [bar] => 
    [baz] => 
)

Note that the NULL byte is always there, you can check it with hexdump.

Expected result

Array
(
    [foo] => foo
    [bar] => bar
    [baz] => baz
)

Notes

I know I can use explode to achieve a similar result. I'm not asking an alternative, I just want to understand the logic behind the a format character ("NUL-padded string" as the doc says).

Where does the "NULL" value get involved in all this?

like image 436
cYrus Avatar asked Jul 24 '12 13:07

cYrus


1 Answers

Original answer

"Where does the "NULL" value get involved in all this?"

Nowhere.

I'm pretty sure that the documentation for PHP pack()/unpack() needs updating. Basically wherever you see it referring to a NULL terminated string, the documention has been taken from the Perl version of the code, and isn't a reflection of what's happening in PHP.

Basically Perl has C style strings which can be null terminated to allow you to know where the end of the string is. In PHP there is no concept of a NULL character. e.g.

$test1 = "Test".NULL."ing";
$test2 = "Testing";

if(strcmp($test1, $test2) == 0){
    echo "The strings are the same";
}
else{
    echo "They are different.";
}

Will print 'The strings are the same'.

Incidentally this: "foo\x00bar\x00baz"

Is probably not doing what you think it's doing. It's not putting a 'NULL' character in the string between foo + bar, and between bar + baz as there is no NULL character. Instead it's putting the character '0', which just happens to not be printed out in most character sets but has no special meaning as a character.

I know you mentioned using explode instead of unpack but if you know the string lengths then you can use:

unpack( 'a3foo/a3bar/a3baz' , $binarydata);

Adding for clarity

Cyrus wrote:

With "NULL byte" I mean the byte with the value 0:

I'm not sure where you got the string "foo\x00bar\x00baz" but:

i) It must be from a language which supports a NULL character represented by a zero. PHP does not support a NULL character and if you call pack("A*A*A*", "foo", "bar", "baz"); It does not generate a string with zeroes in it.

ii) The PHP version of unpack does not support NULL characters (as PHP does not support NULL characters) and treats the character with hex value 0 as just another character. e.g.

function strToHex($string){
    $hex='';
    for ($i=0; $i < strlen($string); $i++)
    {
        $hex .= dechex(ord($string[$i]));
    }
    return $hex;
}

$binarydata = "foo\x00bar\x00baz";

echo "binarydata is ";

var_dump($binarydata);
$y = unpack( 'a3foo/a3bar/a3baz' , $binarydata);
var_dump( $y );

echo strToHex($y['foo'])."\r\n";
echo strToHex($y['bar'])."\r\n";
echo strToHex($y['baz'])."\r\n";

Will output:

binarydata is string(11) "foobarbaz"
array(3) {
  ["foo"]=>
  string(3) "foo"
  ["bar"]=>
  string(3) "ba"
  ["baz"]=>
  string(3) "rb"
}
666f6f
06261
72062

i.e. it extracts the first three characters which are values 0x66, 0x6f, 0x6f. It then extracts the next three characters which are 0x0, 0x62, 0x61. Finally it extracts the values 0x72, 0x0, 0x62.

like image 139
Danack Avatar answered Oct 23 '22 18:10

Danack