Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Perl one liner to extract a multi-line pattern

I have a pattern in a file as follows which can/cannot span over multiple lines :

 abcd25
 ef_gh
 ( fg*_h
 hj_b*
 hj ) {

What I have tried :

perl -nle 'print while m/^\s*(\w+)\s+(\w+?)\s*(([\w-0-9,* \s]))\s{/gm'

I dont know what the flags mean here but all I did was write a regex for the pattern and insert it in the pattern space .This matches well if the the pattern is in a single line as :

abcd25 ef_gh ( fg*_h hj_b* hj ) {

But fails exclusively in the multiline case !

I started with perl yesterday but the syntax is way too confusing . So , as suggested by one of our fellow SO mate ,I wrote a regex and inserted it in the code provided by him .

I hope a perl monk can help me in this case . Alternative solutions are welcome .

Input file :

 abcd25
 ef_gh
 ( fg*_h
 hj_b*
 hj ) {

 abcd25
 ef_gh
 fg*_h
 hj_b*
 hj ) {

 jhijdsiokdù ()lmolmlxjk;
 abcd25 ef_gh ( fg*_h hj_b* hj ) {

Expected output :

 abcd25
 ef_gh
 ( fg*_h
 hj_b*
 hj ) {
 abcd25 ef_gh ( fg*_h hj_b* hj ) {

The input file can have multiple patterns which coincides with the start and end pattern of the required pattern. Thanks in advance for the replies.

like image 867
Gil Avatar asked Aug 03 '12 09:08

Gil


1 Answers

Use the Flip-Flop Operator for a One-Liner

Perl makes this really easy with the flip-flop operator, which will allow you to print out all the lines between two regular expressions. For example:

$ perl -ne 'print if /^abcd25/ ... /\bhj \) {/' /tmp/foo
abcd25
ef_gh
( fg*_h
hj_b*
hj ) {

However, a simple one-liner like this won't differentiate between matches where you want to reject specific matches between the delimiting patterns. That calls for a more complex approach.

More Complicated Comparisons Benefit from Conditional Branching

One-liners aren't always the best choice, and regular expressions can get out of hand quickly if they become too complex. In such situations, you're better off writing an actual program that can use conditional branching rather than trying to use an over-clever regular expression match.

One way to do this is to build up your match with a simple pattern, and then reject any match that doesn't match some other simple pattern. For example:

#!/usr/bin/perl -nw

# Use flip-flop operator to select matches.
if (/^abcd25/ ... /\bhj \) {/) {
    push @string, $_
};

# Reject multi-line patterns that don't include a particular expression
# between flip-flop delimiters. For example, "( fg" will match, while
# "^fg" won't.
if (/\bhj \) {/) {
    $string = join("", @string);
    undef @string;
    push(@matches, $string) if $string =~ /\( fg/;
};

END {print @matches}

When run against the OP's updated corpus, this correctly yields:

abcd25
ef_gh
( fg*_h
hj_b*
hj ) {
abcd25 ef_gh ( fg*_h hj_b* hj ) {
like image 173
Todd A. Jacobs Avatar answered Sep 17 '22 19:09

Todd A. Jacobs