Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How does Perl store integers in-memory?

Tags:

perl

say pack "A*", "asdf";           # Prints "asdf"
say pack "s", 0x41 * 256 + 0x42; # Prints "BA" (0x41 = 'A', 0x42 = 'B')

The first line makes sense: you're taking an ASCII encoded string, packing it into a string as an ASCII string. In the second line, the packed form is "\x42\x41" because of the little endian-ness of short integers on my machine.

However, I can't shake the feeling that somehow, I should be able to treat the packed string from the second line as a number, since that's how (I assume) Perl stores numbers, as little-endian sequence of bytes. Is there a way to do so without unpacking it? I'm trying to get the correct mental model for the thing that pack() returns.

For instance, in C, I can do this:

#include <stdio.h>

int main(void) {
    char c[2];
    short * x = c;
    c[0] = 0x42;
    c[1] = 0x41;

    printf("%d\n", *x); // Prints 16706 == 0x41 * 256 + 0x42
    return 0;
}
like image 911
user3243135 Avatar asked Oct 17 '13 06:10

user3243135


3 Answers

If you're really interested in how Perl stores data internally, I'd recommend PerlGuts Illustrated. But usually, you don't have to care about stuff like that because Perl doesn't give you access to such low-level details. These internals are only important if you're writing XS extensions in C.

If you want to "cast" a two-byte string to a C short, you can use the unpack function like this:

$ perl -le 'print unpack("s", "BA")'
16706
like image 137
nwellnhof Avatar answered Nov 15 '22 23:11

nwellnhof


However, I can't shake the feeling that somehow, I should be able to treat the packed string from the second line as a number,

You need to unpack it first.

  • To be able to use it as a number in C, you need

    char* packed = "\x42\x41";
    int16_t int16;
    memcpy(&int16, packed, sizeof(int16_t));
    
  • To be able to use it as a number in Perl, you need

    my $packed = "\x42\x41";
    my $num = unpack('s', $packed);
    

    which is basically

    use Inline C => <<'__EOI__';
    
       SV* unpack_s(SV* sv) {
          STRLEN len;
          char* buf;
          int16_t int16;
    
          SvGETMAGIC(sv);
          buf = SvPVbyte(sv, len);
          if (len != sizeof(int16_t))
             croak("usage");
    
          Copy(buf, &int16, 1, int16_t);
          return newSViv(int16);
       }
    
    __EOI__
    
    my $packed = "\x42\x41";
    my $num = unpack_s($packed);
    

since that's how (I assume) perl stores numbers, as little-endian sequence of bytes.

Perl stores numbers in one of following three fields of a scalar:

  • IV, a signed integer of size perl -V:ivsize (in bytes).
  • UV, an unsigned integer of size perl -V:uvsize (in bytes). (ivsize=uvsize)
  • NV, a floating point numbers of size perl -V:nvsize (in bytes).

In all case, native endianness is used.

I'm trying to get the correct mental model for the thing that pack() returns.

pack is used to construct "binary data" for interfacing with external APIs.

like image 26
ikegami Avatar answered Nov 15 '22 22:11

ikegami


I see pack as a serialization function. It takes as input Perl values, and outputs a serialized form. The fact the output serialized form happens to be a Perl bytestring is more of an implementation detail than a core functionality.

As such, all you're really expected to do with the resulting string is feed it to unpack, though the serialized form is convenient to have it move around processes, hosts, planets.

If you're interested in serializing it to a number instead, consider using vec:

say vec "BA", 0, 16;  # prints 16961

To take a closer look at the string's internal representation, take a look at Devel::Peek, though you're not going to see anything surprising with a pure ASCII string.

use Devel::Peek;
Dump "BA";

SV = PV(0xb42f80) at 0xb56300
  REFCNT = 1
  FLAGS = (POK,READONLY,pPOK)
  PV = 0xb60cc0 "BA"\0
  CUR = 2
  LEN = 16
like image 2
JB. Avatar answered Nov 15 '22 21:11

JB.