Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parsing binary files in Raku

Tags:

raku

rakudo

I would like to parse binary files in Raku using its regex / grammar engine, but I didn't found how to do it because the input is coerce to string.

Is there a way to avoid this string coercion and use objects of type Buf or Blob ?

I was thinking maybe it is possible to change something in the Metamodel ?

I know that I can use unpack but I would really like to use the grammar engine insted to have more flexibility and readability.

Am I hitting an inherent limit to Raku capabilities here ?

And before someone tells me that regexes are for string and that I shouldn't do it, it should point out that perl's regex engine can match bytes as far as I know, and I could probably use it with Regexp::Grammars, but I prefer not to and use Raku instead.

Also, I don't see any fundamental reason why regex should be reserved only to string, a NFA of automata theory isn't intriscally made for characters instead of bytes.

like image 355
WhiteMist Avatar asked Nov 09 '21 02:11

WhiteMist


1 Answers

Is there a way to avoid this string coercion and use objects of type Buf or Blob ?

Unfortunately not at present. However, one can use the Latin-1 encoding, which gives a meaning to every byte, so any byte sequence will decode to it, and could then be matched using a grammar.

Also, I don't see any fundamental reason why regex should be reserved only to string, a NFA of automata theory isn't intriscally made for characters instead of bytes.

There isn't one; it's widely expected that the regex/grammar engine will be rebuilt at some point in the future (primarily to deal with performance limitations), and that would be a good point to also consider handling bytes and also codepoint level strings (Uni).

like image 98
Jonathan Worthington Avatar answered Nov 18 '22 20:11

Jonathan Worthington