Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to properly write to a file using File::Map?

I'm using File::Map often to map especially small text files into memory and e.g. process some read-only regular expressions on those. Now I have a use case in which I need to replace some text in the file as well and thought that I can still use File::Map, because it documents the following:

Files are mapped into a variable that can be read just like any other variable, and it can be written to using standard Perl techniques such as regexps and substr.

While the data I'm interested in to replace is properly replaced within the file, I'm losing data because the file keeps its original size and data is truncated in the end. The new data is a little bit larger than the old one. Both things are warned about as documented using the following sentences:

Writing directly to a memory mapped file is not recommended

Truncating new value to size of the memory map

The explanations to both warnings read like one shouldn't ever write anything using File::Map, but it might work in cases one can either live with truncated files or the overall file size is simply not changed at all. But the first quote explicitly mentions writes as supported without any exception from that rule.

So, is there some special way to safely write using File::Map, e.g. getting the underlying file increased and such? The first warning uses the wording directly, which I have the feeling that there's some other, better supported way to write?

I'm simply using =~ s/// on the mapped view currently, which seems to be the wrong approach. I couldn't even find anyone trying to write using File::Map at all, only the official tests which do exactly what I do and expect the warnings I get. Additionally, looking at the code, there seems to be only one use case in which writing doesn't result in a warning at all, though I don't understand how I'm able to trigger that:

static int mmap_write(pTHX_ SV* var, MAGIC* magic) {
        struct mmap_info* info = (struct mmap_info*) magic->mg_ptr;
        if (!SvOK(var))
                mmap_fixup(aTHX_ var, info, NULL, 0);
        else if (!SvPOK(var)) {
                STRLEN len;
                const char* string = SvPV(var, len);
                mmap_fixup(aTHX_ var, info, string, len);
        }
        else if (SvPVX(var) != info->fake_address)
                mmap_fixup(aTHX_ var, info, SvPVX(var), SvCUR(var));
        else
                SvPOK_only_UTF8(var);
        return 0;
}

https://metacpan.org/source/LEONT/File-Map-0.55/lib/File/Map.xs#L240

After all, if writing should be avoided at all, why do the docs explicitly mention it as supported? Doesn't look supported to me if it results at least in a warning in all cases but one.

like image 926
Thorsten Schöning Avatar asked Dec 07 '18 14:12

Thorsten Schöning


2 Answers

An mmap is a fixed-sized mapping of a portion of a file to memory.

The various mapping functions set the string buffer of the provided scalar to the mapped memory page. The OS will reflect any changes to that buffer to the file and vice versa if requested.

The proper way to work with an mmap is to modify the string buffer, not replace it.

  • Anything that changes the string buffer without changing its size is appropriate.

    $ perl -e'print "\0"x16' >scratch
    
    $ perl -MFile::Map=map_file -we'
       map_file my $map, "scratch", "+<";
       $map =~ s/\x00/\xFF/g;             # ok
       substr($map, 6, 2, "00");          # ok
       substr($map, 8, 2) = "11";         # ok
       substr($map, 7, 2) =~ s/../22/;    # ok
    '
    
    $ hexdump -C scratch
    00000000  ff ff ff ff ff ff 30 32  32 31 ff ff ff ff ff ff  |......0221......|
    00000010
    
  • Anything that replaces the string buffer (such as assigning to the scalar) is not ok.

    ...kinda. The module notices you've replaced the scalar's buffer. It proceeds to copy the contents of the new buffer to the mapped memory, then replaces the scalar's buffer with the pointer to the mapped memory.

    $ perl -e'print "\0"x16' >scratch
    
    $ perl -MFile::Map=map_file -we'
       map_file my $map, "scratch", "+<";
       $map = "4" x 16;  # Effectively: substr($map, 0, 16, "4" x 16)
    '
    Writing directly to a memory mapped file is not recommended at -e line 3.
    
    $ hexdump -C scratch
    00000000  34 34 34 34 34 34 34 34  34 34 34 34 34 34 34 34  |4444444444444444|
    00000010
    

    Aside from the warning can be silenced using no warnings qw( substr );,[1] the only down side is that doing this way requires using memcpy to copy length($map) bytes, while using substr($map, $pos, length($repl), $repl) only requires copying length($repl) bytes.

  • Anything that changes the size of string buffer is not ok.

    $ perl -MFile::Map=map_file -we'
       map_file my $map, "scratch", "+<";
       $map = "5" x 32;  # Effectively: substr($map, 0, 16, "5" x 16)
    '
    Writing directly to a memory mapped file is not recommended at -e line 3.
    Truncating new value to size of the memory map at -e line 3.
    
    $ hexdump -C scratch
    00000000  35 35 35 35 35 35 35 35  35 35 35 35 35 35 35 35  |5555555555555555|
    00000010
    

WARNING: The module doesn't warn if you shrink the buffer, even though this has no effect except to clobber one of the bytes with a NUL.

$ perl -e'print "\0"x16' >scratch

$ perl -MFile::Map=map_file -we'
   map_file my $map, "scratch", "+<";
   substr($map, 0, 16, "6" x 16);
   substr($map, 14, 2, "");
'

$ hexdump -C scratch
00000000  36 36 36 36 36 36 36 36  36 36 36 36 36 36 00 36  |66666666666666.6|
00000010

I've submitted a ticket.


  1. This is somewhat ironic, seeing as it more or less warns when not using substr, but I suppose it also warn when using substr "incorrectly".
like image 98
ikegami Avatar answered Sep 22 '22 05:09

ikegami


The first quote,

Files are mapped into a variable that can be read just like any other variable, and it can be written to using standard Perl techniques such as regexps and substr.

is under the heading "Simplicity".

And it is true: You can simply write Perl code that manipulates strings and the data will end up in the file.

However, in the section Warnings we have:

Writing directly to a memory mapped file is not recommended

Due to the way perl works internally, it's not possible to write a mapping implementation that allows direct assignment yet performs well. As a compromise, File::Map is capable of fixing up the mess if you do it nonetheless, but it will warn you that you're doing something you shouldn't. This warning is only given when use warnings 'substr' is in effect.

That is, writing through an mmap'd variable is not efficient unless the modification of the string buffer can be done in place (the string has to be assembled and stored in memory first and is only copied over to the file afterwards). If you're OK with this, you can disable the warning with no warnings 'substr'.

Additionally, looking at the code, there seems to be only one use case in which writing doesn't result in a warning at all, though I don't understand how I'm able to trigger that.

That's the case where you're trying to write a buffer to itself. This happens when a scalar is actually modified in place. The other cases are workarounds for when the string buffer is replaced (e.g. because it's overwritten: $foo = $bar). For a real in-place modification no extra work is necessary and you don't get the warning.

But this doesn't help you because growing a string cannot be done in-place with a fixed size mapped buffer.

Changing the size of the file is not possible. This is not because of File::Map, but because the underlying mmap system call works on fixed size mappings and does not provide any option to resize files automatically.

If you need to edit files (especially small files), I recommend using edit in Path::Tiny instead.

like image 25
melpomene Avatar answered Sep 20 '22 05:09

melpomene