Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Raku Regex to capture and modify the LFM code blocks

Update: Corrected code added below

I have a Leanpub flavored markdown* file named sample.md I'd like to convert its code blocks into Github flavored markdown style using Raku Regex

Here's a sample **ruby** code, which
prints the elements of an array:

{:lang="ruby"}
    ['Ian','Rich','Jon'].each {|x| puts x}

Here's a sample **shell** code, which
removes the ending commas and
finds all folders in the current path:

{:lang="shell"}
    sed s/,$//g
    find . -type d

In order to capture the lang value, e.g. ruby from the {:lang="ruby"} and convert it into

```ruby

I use this code

my @in="sample.md".IO.lines;
my @out;
for @in.kv -> $key,$val {
    if $val.starts-with("\{:lang") {
       if $val ~~ /^{:lang="([a-z]+)"}$/ { # capture lang
           @out[$key]="```$0"; # convert it into ```ruby
           $key++;
           while @in[$key].starts-with("    ") {
                 @out[$key]=@in[$key].trim-leading;
                 $key++;
           }
           @out[$key]="```";
       }
    }
    @out[$key]=$val;
}

The line containing the Regex gives Cannot modify an immutable Pair (lang => True) error.

I've just started out using Regexes. Instead of ([a-z]+) I've tried (\w) and it gave the Unrecognized backslash sequence: '\w' error, among other things.

How to correctly capture and modify the lang value using Regex?

  • the LFM format just estimated

Corrected code:

my @in="sample.md".IO.lines;
my \[email protected];
my @out;
my $k = 0;

while ($k < len) {
    if @in[$k] ~~ / ^ '{:lang="' (\w+) '"}' $ / { 
    push @out, "```$0";
    $k++;
    while @in[$k].starts-with("    ") {
        push @out, @in[$k].trim-leading;
        $k++;   }
    push @out, "```";
    }
    push @out, @in[$k];
    $k++;
}

for @out {print "$_\n"}
like image 355
Lars Malmsteen Avatar asked Feb 27 '21 21:02

Lars Malmsteen


2 Answers

This one-liner seems to solve the problem:

say S:g /\{\: "lang" \= \" (\w+) \" \} /```$0/ given "text.md".IO.slurp;

Let's try and explain what was going on, however. The error was a regular expression grammar error, caused by having a : being followed by a name, and all that inside a curly. {} runs code inside a regex. Raiph's answer is (obviously) correct, by changing it to a Perl regular expression. But what I've done here is to change it to a Raku's non-destructive substitution, with the :g global flag, to make it act on the whole file (slurped at the end of the line; I've saved it to a file called text.md). So what this does is to slurp your target file, with given it's saved in the $_ topic variable, and printed once the substitution has been made. Good thing is if you want to make more substitutions you can shove another such expression to the front, and it will act on the output. Using this kind of expression is always going to be conceptually simpler, and possibly faster, than dealing with a text line by line.

like image 101
jjmerelo Avatar answered Nov 05 '22 18:11

jjmerelo


TL;DR

  • TL? Then read @jjemerelo's excellent answer which not only provides a one-line solution but much more in a compact form ;

  • DR? Aw, imo you're missing some good stuff in this answer that JJ (reasonably!) ignores. Though, again, JJ's is the bomb. Go read it first. :)

Using a Perl regex

There are many dialects of regex. The regex pattern you've used is a Perl regex but you haven't told Raku that. So it's interpreting your regex as a Raku regex, not a Perl regex. It's like feeding Python code to perl. So the error message is useless.


One option is to switch to Perl regex handling. To do that, this code:

      /^{:lang="([a-z]+)"}$/

needs m :P5 at the start:

m :P5 /^{:lang="([a-z]+)"}$/

The m is implicit when you use /.../ in a context where it is presumed you mean to immediately match, but because the :P5 "adverb" is being added to modify how Raku interprets the pattern in the regex, one has to also add the m.

:P5 only supports a limited set of Perl's regex patterns. That said, it should be enough for the regex you've written in your question.

Using a Raku regex

If you want to use a Raku regex you have to learn the Raku regex language.

The "spirit" of the Raku regex language is the same as Perl's, and some of the absolute basic syntax is the same as Perl's, but it's different enough that you should view it as yet another dialect of regex, just one that's generally "powered up" relative to Perl's regexes.

To rewrite the regex in Raku format I think it would be:

/ ^ '{:lang="' (<[a..z]>+) '"}' $ /

(Taking advantage of the fact whitespace in Raku regexes is ignored.)

Other problems in your code

After fixing the regex, one encounters other problems in your code.

The first problem I encountered is that $key is read-only, so $key++ fails. One option is to make it writable, by writing -> $key is copy ..., which makes $key a read-write copy of the index passed by the .kv.

But fixing that leads to another problem. And the code is so complex I've concluded I'd best not chase things further. I've addressed your immediate obstacle and hope that helps.

like image 34
raiph Avatar answered Nov 05 '22 18:11

raiph