Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regular expressions: How do I grab a block of text using regex? (in ruby)

I'm using ruby and I'm trying to find a way to grab text in between the {start_grab_entries} and {end_grab_entries} like so:

{start_grab_entries}
i want to grab
the text that
you see here in
the middle
{end_grab_entries}

Something like so:

$1 => "i want to grab
       the text that
       you see here in
       the middle"

So far, I tried this as my regular expression:

\{start_grab_entries}(.|\n)*\{end_grab_entries}

However, using $1, that gives me a blank. Do you know what I can do to grab that block of text in between the tags correctly?

like image 583
sjsc Avatar asked Dec 29 '22 06:12

sjsc


2 Answers

There is a better way to allow the dot to match newlines (/m modifier):

regexp = /\{start_grab_entries\}(.*?)\{end_grab_entries\}/m

Also, make the * lazy by appending a ?, or you might match too much if more than one such section occurs in your input.

That said, the reason why you got a blank match is that you repeated the capturing group itself; therefore you only caught the last repetition (in this case, a \n).

It would have "worked" if you had put the capturing group outside of the repetition:

\{start_grab_entries\}((?:.|\n)*)\{end_grab_entries\}`

but, as said above, there is a better way to do that.

like image 102
Tim Pietzcker Avatar answered Dec 30 '22 19:12

Tim Pietzcker


I'm adding this because often we're reading data from a file or data-stream where the range of lines we want are not all in memory at once. "Slurping" a file is discouraged if the data could exceed the available memory, something that easily happens in production corporate environments. This is how we'd grab lines between some boundary markers as the file is being scanned. It doesn't rely on regex, instead using Ruby's "flip-flop" .. operator:

#!/usr/bin/ruby

lines = []
DATA.each_line do |line|
  lines << line if (line['{start_grab_entries}'] .. line['{end_grab_entries}'])
end

puts lines          # << lines with boundary markers
puts
puts lines[1 .. -2] # << lines without boundary markers

__END__
this is not captured

{start_grab_entries}
i want to grab
the text that
you see here in
the middle
{end_grab_entries}

this is not captured either

Output of this code would look like:

{start_grab_entries}
i want to grab
the text that
you see here in
the middle
{end_grab_entries}

i want to grab
the text that
you see here in
the middle
like image 36
the Tin Man Avatar answered Dec 30 '22 18:12

the Tin Man