How to turn tabs into blockquotes using perl regex

Question

If I have HTML that are lines like the following: ( means Tab character)

<P>	Some text</P>
<P>		Some text</P>
<P>	Some text</P>

Using regex, how can I convert the above to:

<P><BLOCKQUOTE>Some text</BLOCKQUOTE></P>
<P><BLOCKQUOTE><BLOCKQUOTE>Some text</BLOCKQUOTE></BLOCKQUOTE></P>
<p><BLOCKQUOTE>Some text></BLOCKQUOTE></P>

At the moment I have:

for $line (@lines)
{
   $line =~ s{^(<P>(?:<BLOCKQUOTE>)*)	(.+?)((?:</BLOCKQUOTE>)*</P>)$}{$1<BLOCKQUOTE>$2</BLOCKQUOTE>$3}g;
}

zdim · Accepted Answer

The tricky bit here is to somehow enter as many replacement tags as there are tabs.

I'd go with multiple passes, first counting the tabs and then going over the string again to replace them with the right number of open-close replacement tags (BLOCKQUOTE). In this case a single regex is bound to be much more complex and thus that much harder to tweak and maintain.

use warnings;
use strict;
use feature 'say';

my @test_strings = ( 
    qq(<p>		two tabs</p>),
    qq(<p>	one tab</p>),
    qq(<p>no tab</p>),
    qq(<div>	not paragraph</div>),
);

say for @test_strings;  say '';

for (@test_strings) 
{
    if (my ($tabs) = /<p>(	+)/)          # capture consecutive tabs
    { 
        my $nt = () = $tabs =~ /	/g;     # count them

        my $ot = "<BLOCKQUOTE>"  x $nt;   # open-tag
        my $ct = "</BLOCKQUOTE>" x $nt;   # close-tag

        s{<p> 	+ ([^	].+?) </p>}{<p>$ot$1$ct</p>}x; 

        say;
    }       
}

Prints

<p>             two tabs</p>
<p>     one tab</p>
<p>no tab</p>
<div>   not paragraph</div>

<p><BLOCKQUOTE><BLOCKQUOTE>two tabs</BLOCKQUOTE></BLOCKQUOTE></p>
<p><BLOCKQUOTE>one tab</BLOCKQUOTE></p>
<p>no tab</p>
<div>   not paragraph</div>

Notes

As it stands this works with at most one paragraph (<p>...</p>) in the string, while
```
while (my ($tabs) = /<p>(	+)/g) { ... }
```
(instead of if (...)) appears to work with multiple paragraphs. Needs more testing
Counting itself uses =()= "operator". It imposes list context on its right-hand side, so the regex returns the list of matches, assigned to a scalar on its left. Thus we get the count.

In this case, with $tabs consisting of only the tab characters, one can simply do
```
 my $nt = split '', $tabs;
```
(Update: really just my $nt = length $tabs;, as in other answers)

I still use the regex since it'll work for a string with things other than just tabs, as well
The code replaces only the consecutive tabs in the beginning, right after <p>, not any that may come later in the string (how I see the requirement).

This is provided for by following the tabs in the pattern ( +) with a single non-tab character and then any characters, [^ ].*?. Thus this matches for a string with more tabs further down but replaces only the leading "block" of tabs

How to turn tabs into blockquotes using perl regex

Tags:

regex

perl

CJ7

1 Answers

zdim

Recent Activity

Donate For Us

How to turn tabs into blockquotes using perl regex

Tags:

regex

perl

CJ7

1 Answers

zdim

Related questions

Recent Activity

Donate For Us