Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In Perl, can I treat a string as a byte array?

In Perl, is it appropriate to use a string as a byte array containing 8-bit data? All the documentation I can find on this subject focuses on 7-bit strings.

For instance, if I read some data from a binary file into $data

my $data;

open FILE, "<", $filepath;
binmode FILE;
read FILE $data 1024;

and I want to get the first byte out, is substr($data,1,1) appropriate? (again, assuming it is 8-bit data)

I come from a mostly C background, and I am used to passing a char pointer to a read() function. My problem might be that I don't understand what the underlying representation of a string is in Perl.

like image 584
Mike Avatar asked Jun 17 '10 21:06

Mike


1 Answers

The bundled documentation for the read command, reproduced here, provides a lot of information that is relevant to your question.

read FILEHANDLE,SCALAR,LENGTH,OFFSET

read FILEHANDLE,SCALAR,LENGTH

Attempts to read LENGTH characters of data into variable SCALAR from the specified FILEHANDLE. Returns the number of characters actually read, 0 at end of file, or undef if there was an error (in the latter case $! is also set). SCALAR will be grown or shrunk so that the last character actually read is the last character of the scalar after the read.

An OFFSET may be specified to place the read data at some place in the string other than the beginning. A negative OFFSET specifies placement at that many characters counting backwards from the end of the string. A positive OFFSET greater than the length of SCALAR results in the string being padded to the required size with "\0" bytes before the result of the read is appended.

The call is actually implemented in terms of either Perl's or system's fread() call. To get a true read(2) system call, see "sysread".

Note the characters: depending on the status of the filehandle, either (8-bit) bytes or characters are read. By default all filehandles operate on bytes, but for example if the filehandle has been opened with the ":utf8" I/O layer (see "open", and the "open" pragma, open), the I/O will operate on UTF-8 encoded Unicode characters, not bytes. Similarly for the ":encoding" pragma: in that case pretty much any characters can be read.

like image 150
mob Avatar answered Sep 22 '22 22:09

mob