How are scalars stored 'under the hood' in perl?

Tags:

perl

The basic types in perl are different then most languages, with types being scalar, array, hash (but apparently not subroutines, &, which I guess are really just scalar references with syntactical sugar). What is most odd about this is that the most common data types: int, boolean, char, string, all fall under the basic data type "scalar". It seems that perl decides rather to treat a scalar as a string, boolean, or number based off of the operator that modifies it, implying the scalar itself is not actually defined as "int" or "String" when saved.

This makes me curious as to how these scalars are stored "under the hood", particularly in regards to it's effect on efficiency (yes I know scripting languages sacrifice efficiency for flexibility, but they still need to be as optimized as possible when flexibility concerns are not affected). It's much easier for me to store the number 65535 (which takes two bytes) then the string "65535" which takes 6 bytes, as such recognizing that $val = 65535 is storing an int would allow me to use 1/3 the memory, in large arrays this could mean fewer cache hits as well.

It's not just limited to saving memory of course. There are times when I can offer more significant optimizations if I know what type of scalar to expect. For instance if I have a hash using very large integers as keys it would be far faster to look up a value if I recognizing the keys as ints, allowing a simply modulo for creating my hash key, then if I have to run more complex hashing logic on a string that has 3 times the bytes.

So I'm wondering how perl handles these scalars under the hood. Does it store every value as a string, sacrificing the extra memory and cpu cost of constant converting string to int in the case that a scalar is always used as an int? Or does it have some logic for inference the type of scalar used to determine how to save and manipulate it?

Edit:

TJD linked to perlguts, which answers half my question. A scalar is actually stored as string, int (signed, unsigned, double) or pointer. I'm not too surprised, I had mostly expected this behavior to occur under the hood, though it's interesting to see the exact types. I'm leaving this question open though because perlguts is actually to low level. Other then telling me that 5 data types exist it doesn't specify how perl works to alternate between them, ie how perl decides which SV type to use when a scalar is saved and how it knows when/how to cast.

666

asked Jan 12 '16 19:01

dsollen

1 Answers

There are actually a number of types of scalars. A scalar of type SVt_IV can hold undef, a signed integer (IV) or an unsigned integer (UV). One of type SVt_PVIV can also hold a string^[1]. Scalars are silently upgraded from one type to another as needed^[2]. The TYPE field indicates the type of a scalar. In fact, arrays (SVt_AV) and hashes (SVt_HV) are really just types of scalars.

While the type of a scalar indicates what the scalar can contain, flags are used to indicate what a scalar does contain. This is stored in the FLAGS field. SVf_IOK signals that a scalar contains a signed integer, while SVf_POK indicates it contains a string^[3].

Devel::Peek's Dump is a great tool for looking at the internals of scalars. (The constant prefixes SVt_ and SVf_ are omitted by Dump.)

$ perl -e'
   use Devel::Peek qw( Dump );
   my $x = 123;
   Dump($x);
   $x = "456";
   Dump($x);
   $x + 0;
   Dump($x);
'
SV = IV(0x25f0d20) at 0x25f0d30       <-- SvTYPE(sv) == SVt_IV, so it can contain an IV.
  REFCNT = 1
  FLAGS = (IOK,pIOK)                  <-- IOK: Contains an IV.
  IV = 123                            <-- The contained signed integer (IV).

SV = PVIV(0x25f5ce0) at 0x25f0d30     <-- The SV has been upgraded to SVt_PVIV
  REFCNT = 1                              so it can also contain a string now.
  FLAGS = (POK,IsCOW,pPOK)            <-- POK: Contains a string (but no IV since !IOK).
  IV = 123                            <-- Meaningless without IOK.
  PV = 0x25f9310 "456"\0              <-- The contained string.
  CUR = 3                             <-- Number of bytes used by PV (not incl \0).
  LEN = 10                            <-- Number of bytes allocated for PV.
  COW_REFCNT = 1

SV = PVIV(0x25f5ce0) at 0x25f0d30
  REFCNT = 1
  FLAGS = (IOK,POK,IsCOW,pIOK,pPOK)   <-- Now contains both a string (POK) and an IV (IOK).
  IV = 456                            <-- This will be used in numerical contexts.
  PV = 0x25f9310 "456"\0              <-- This will be used in string contexts.
  CUR = 3
  LEN = 10
  COW_REFCNT = 1

illguts documents the internal format of variables quite thoroughly, but perlguts might be a better place to start.

If you start writing XS code, keep in mind it's usually a bad idea to check what a scalar contains. Instead, you should request what should have been provided (e.g. using SvIV or SvPVutf8). Perl will automatically convert the value to the requested type (and warn if appropriate). API calls are documented in perlapi.

In fact, it can hold a string an either a signed integer or an unsigned integer at the same time.
All scalars (including arrays and hashes, excluding one type of scalar that can only hold undef) have two memory blocks at their base. Pointers to the scalar point to its head, which contains the TYPE field and a pointer to the body. Upgrading a scalar replaces the body of the scalar. That way, pointers to the scalar aren't invalidated by an upgrade.
An undef variable is one without any uppercase OK flags set.

102

answered Sep 21 '22 12:09

ikegami

Related questions
                            
                                Split files based on file content and pattern matching
                            
                                Install Perl modules with lots of dependencies on a machine without CPAN network access
                            
                                How can I create a qr// in Perl 5.12 from C?
                            
                                How to match a newline \n in a perl regex?
                            
                                What happens to a SIGINT (^C) when sent to a perl script containing children?
                            
                                How can I access the ref count of a Perl hash?
                            
                                How to redirect STDOUT and STDERR to a variable
                            
                                odd number of elements in anonymous hash
                            
                                What is a magical array in Perl?
                            
                                Is there really no better way to document perl code than POD?
                            
                                How to rebuild/reinstall dependencies of an up to date CPAN module?
                            
                                Convert a UTF8 string to ASCII in Perl
                            
                                malformed header from script. Bad header=<!DOCTYPE html>
                            
                                Using regular expressions to find a word with the five letters abcde, each letter appearing exactly once, in any order, with no breaks in between
                            
                                What's the best library for parsing RSS/Atom in Perl?
                            
                                How can I selectively access elements returned by a Perl subroutine?
                            
                                git svn clone fails on Mac OS X: "Temp file with moniker 'svn_delta' already in use"
                            
                                How can I print text immediately without waiting for a newline in Perl?
                            
                                How can I use Perl to concatenate array elements between two indexes?
                            
                                Difference between $@ and $! in perl

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With