Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Nativecall Buf lifetime and Garbage Collector

Tags:

raku

I've got a chunk of memory in a Buf I want to pass in to a C library, but the library will be using the memory beyond the lifetime of a single call.

I understand that can be problematic since the Garbage Collector can move memory around.

For passing in a Str, the Nativecall docs say "If the C function requires the lifetime of a string to exceed the function call, the argument must be manually encoded and passed as CArray[uint8]" and have an example of doing that, essentially:

my $array = CArray[uint8].new($string.encode.list);

My question is: Must I do the same thing for a Buf? In case it gets moved by the GC? Or will the GC leave my Buf where it sits? For a short string, that isn't a big deal, but for a large memory buffer, that could potentially be an expensive operation. (See, for example, Archive::Libarchive which you can pass in a Buf with a tar file. Is that code problematic?

multi method open(Buf $data!) {
    my $res = archive_read_open_memory $!archive, $data, $data.bytes; 
    ...

Is there (could there be? should there be?) some sort of trait on a Buf that tells the GC not to move it around? I know that could be trouble if I add more data to the Buf, but I promise not to do that. What about for a Blob that is immutable?

like image 356
Curt Tilmes Avatar asked Mar 17 '19 16:03

Curt Tilmes


1 Answers

You'll get away with this on MoarVM, at least at the moment, provided that you keep a reference to the Blob or Buf alive in Perl 6 for as long as the native code needs it and (in the case of Buf) you don't do a write to it that could cause a resize.

MoarVM allocates the Blob/Buf object inside of the nursery, and will move it during GC runs. However, that object does not hold the data; rather, it holds the size and a pointer to a block of memory holding the values. That block of memory is not allocated using the GC, and so will not move.

+------------------------+
| GC-managed Blob object |
+------------------------+      +------------------------+
| Elements               |----->| Non-GC-managed memory  |
+------------------------+      | (this bit is passed to |
| Size                   |      | native code)           |
+------------------------+      +------------------------+

Whether you should rely on this is a trickier question. Some considerations:

  • So far as I can tell, things could go rather less well if running on the JVM. I don't know about the JavaScript backend. You could legitimately decide that, due to adoption levels, you're only going to worry about running on MoarVM for now.
  • Depending on implementation details of MoarVM is OK if you just need the speed in your own code, but if working on a module you expect to be widely adopted, you might want to think if it's worth it. A lot of work is put in by both the Rakudo and MoarVM teams to not regress working code in the module ecosystem, even in cases where it can be well argued that it depended on bugs or undefined behavior. However, that can block improvements. Alternatively, on occasion, the breakage is considered worth it. Either way, it's time consuming, and falls on a team of volunteers. Of course, when module authors are responsive and can apply provided patches, it's somewhat less of a problem.

The problem with "put a trait on it" is that the decision - at least on the JVM - seems to need to be made up front at the time that the memory holding the data is allocated. In which case, a portable solution probably can't allow an existing Buf/Blob to be marked up as such. Perhaps a better way will be for I/O-ish things to be asked to give something CArray-like instead, so that zero-copy can be achieved by having the data in the "right kind of memory" in the first place. That's probably a reasonable feature request.

like image 129
Jonathan Worthington Avatar answered Oct 21 '22 13:10

Jonathan Worthington