I have an app which reads a giant chunk of textual data into a scalar, sometimes even GBs in size. I use substr
on that scalar to read most of the data into another scalar and replace the extracted data with an empty string, because it is not needed in the first scalar anymore. What I've found recently was that Perl is not freeing the memory of the first scalar, while it is recognizing that its logical length has changed. So what I need to do is extract the data from the first scalar into a third again, undef
the first scalar und put the extracted data back in place. Only this way the memory occupied by the first scalar is really freed up. Assigning undef to that scalar or some other value less than the allocated block of memory doesn't change anything about the allocated memory.
The following is what I do now:
$$extFileBufferRef = substr($$contentRef, $offset, $length, '');
$length = length($$contentRef);
my $content = substr($$contentRef, 0, $length);
$$contentRef = undef( $$contentRef) || $content;
$$contentRef
might be e.g. 5 GBs in size in the first line, I extract 4,9 GB of data and replace the extracted data. The second line would now report e.g. 100 MBs of data as the length of the string, but e.g. Devel::Size::total_size
would still output that 5 GB of data are allocated for that scalar. And assigning undef
or such to $$contentRef
doesn't seem to change a thing about that, I need to call undef
as a function on that scalar.
I would have expected that the memory behind $$contentRef
is already at least partially freed after substr
was applied. Doesn't seem to be the case...
So, is memory only freed if variables go out of scope? And if so, why is assigning undef
different to calling undef
as a function on the same scalar?
Your analysis is correct.
$ perl -MDevel::Peek -e'
my $x; $x .= "x" for 1..100;
Dump($x);
substr($x, 50, length($x), "");
Dump($x);
'
SV = PV(0x24208e0) at 0x243d550
...
CUR = 100 # length($x) == 100
LEN = 120 # 120 bytes are allocated for the string buffer.
SV = PV(0x24208e0) at 0x243d550
...
CUR = 50 # length($x) == 50
LEN = 120 # 120 bytes are allocated for the string buffer.
Not only does Perl overallocate strings, it doesn't even free variables that go out of scope, instead reusing them the next time the scope is entered.
$ perl -MDevel::Peek -e'
sub f {
my ($set) = @_;
my $x;
if ($set) { $x = "abc"; $x .= "def"; }
Dump($x);
}
f(1);
f(0);
'
SV = PV(0x3be74b0) at 0x3c04228 # PV: Scalar may contain a string
REFCNT = 1
FLAGS = (POK,pPOK) # POK: Scalar contains a string
PV = 0x3c0c6a0 "abcdef"\0 # The string buffer
CUR = 6
LEN = 10 # Allocated size of the string buffer
SV = PV(0x3be74b0) at 0x3c04228 # Could be a different scalar at the same address,
REFCNT = 1 # but it's truly the same scalar
FLAGS = () # No "OK" flags: undef
PV = 0x3c0c6a0 "abcdef"\0 # The same string buffer
CUR = 6
LEN = 10 # Allocated size of the string buffer
The logic is that if you needed the memory once, there's a strong chance you'll need it again.
For the same reason, assigning undef
to a scalar doesn't free its string buffer. But Perl gives you a chance to free the buffers if you want, so passing a scalar to undef
does force the freeing of the scalar's internal buffers.
$ perl -MDevel::Peek -e'
my $x = "abc"; $x .= "def"; Dump($x);
$x = undef; Dump($x);
undef $x; Dump($x);
'
SV = PV(0x37d1fb0) at 0x37eec98 # PV: Scalar may contain a string
REFCNT = 1
FLAGS = (POK,pPOK) # POK: Scalar contains a string
PV = 0x37e8290 "abcdef"\0 # The string buffer
CUR = 6
LEN = 10 # Allocated size of the string buffer
SV = PV(0x37d1fb0) at 0x37eec98 # PV: Scalar may contain a string
REFCNT = 1
FLAGS = () # No "OK" flags: undef
PV = 0x37e8290 "abcdef"\0 # The string buffer is still allcoated
CUR = 6
LEN = 10 # Allocated size of the string buffer
SV = PV(0x37d1fb0) at 0x37eec98 # PV: Scalar may contain a string
REFCNT = 1
FLAGS = () # No "OK" flags: undef
PV = 0 # The string buffer has been freed.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With