Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex recursive code block content

I require to get the content between 2 directives (embed and endembed) using RegEx. My current pattern does this correctly /(?<!\w)(\s*)@embed(\s*\(.*\))([\w\W]*?)@endembed/g.

However, when the directives are nested it does not matches the blocks correctly. https://regex101.com/r/nL8gV5/2,

@extends('layouts/default')

@section('content')
    <div class="row">
        <div class="col-md-6">
            @embed('components/box')
                @section('title', 'Box title')
                @section('content')
                    <h4>Haai</h4>
                    Box content
                @stop
            @endembed
        </div>
        <div class="col-md-6">
            @embed('components/box')
                @section('title', 'Box2 title')
                @section('content')

                    @embed('components/timeline')
                        @section('items')
                        @stop
                    @endembed

                @stop
            @endembed
        </div>
    </div>
@stop

Desired output:

1:    
@section('title', 'Box title')
@section('content')
    <h4>Haai</h4>
    Box content
@stop

2:
@section('title', 'Box2 title')
@section('content')
    @embed('components/timeline')
        @section('items')
        @stop
    @endembed
@stop

3:
@section('items')
@stop

I've tried various patterns but i can't seem to get it right. It is in my understanding that i should use the (R?) recursive token combined with a backreference? something more like this https://regex101.com/r/nL8gV5/3. After spending several hours fiddling around, i still haven't got it working.

What am i doing wrong and what is the correct pattern?

like image 312
Robin Radic Avatar asked May 25 '26 12:05

Robin Radic


2 Answers

To capture the outer @embed and nested ones, use recursive regex:

$pattern = '/@embed\s*\([^)]*\)((?>(?!@(?:end)?embed).|(?0))*)@endembed/s';

At (?0) the pattern is pasted. See test at regex101. Replace with captured $1 while matching out:

$res = array();

while (preg_match_all($pattern, $str, $out)) {
  $str = preg_replace($pattern, "$1", $str);
  $res = array_merge($res, $out[1]);
}

This will give you the outer and nested ones up to the innermost. Test at eval.in


The basic recursive pattern without any capturing is as simple as this:

/@embed\b(?>(?!@(?:end)?embed\b).|(?0))*@endembed/s
  • Match the literal @embed followed by \b word boundary
  • (?> Using a non capturing atomic group for alternation:
  • Alternate between: (?!@(?:end)?embed). A character that starts not @embed or @endembed |(?0) OR paste the pattern from start. )* The whole thing any amount of times.
  • Match the literal @endembed

Using s (PCRE_DOTALL) flag for making the dot also match newlines

like image 83
Jonny 5 Avatar answered May 27 '26 02:05

Jonny 5


I came up with this recursive regex from an example I had (from this stackoverflow answer):

(?=(@embed(?:(?>(?:(?!@embed|@endembed).)+)*|(?1))*@endembed))

Try it on regex101

like image 31
meuh Avatar answered May 27 '26 01:05

meuh