Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex Match all characters between two strings

Tags:

regex

Example: This is just\na simple sentence.

I want to match every character between This is and sentence. Line breaks should be ignored. I can't figure out the correct syntax.

like image 882
0xbadf00d Avatar asked May 24 '11 11:05

0xbadf00d


People also ask

How do I find a character in a string in regex?

To match a character having special meaning in regex, you need to use a escape sequence prefix with a backslash ( \ ). E.g., \. matches "." ; regex \+ matches "+" ; and regex \( matches "(" . You also need to use regex \\ to match "\" (back-slash).

What does \\ mean in regex?

\\. matches the literal character . . the first backslash is interpreted as an escape character by the Emacs string reader, which combined with the second backslash, inserts a literal backslash character into the string being read. the regular expression engine receives the string \.

How do you use wildcards in regex?

In regular expressions, the period ( . , also called "dot") is the wildcard pattern which matches any single character. Combined with the asterisk operator . * it will match any number of any characters.

What is a capturing group regex?

Capturing groups are a way to treat multiple characters as a single unit. They are created by placing the characters to be grouped inside a set of parentheses. For example, the regular expression (dog) creates a single group containing the letters "d" "o" and "g" .


3 Answers

For example

(?<=This is)(.*)(?=sentence)

Regexr

I used lookbehind (?<=) and look ahead (?=) so that "This is" and "sentence" is not included in the match, but this is up to your use case, you can also simply write This is(.*)sentence.

The important thing here is that you activate the "dotall" mode of your regex engine, so that the . is matching the newline. But how you do this depends on your regex engine.

The next thing is if you use .* or .*?. The first one is greedy and will match till the last "sentence" in your string, the second one is lazy and will match till the next "sentence" in your string.

Update

Regexr

This is(?s)(.*)sentence

Where the (?s) turns on the dotall modifier, making the . matching the newline characters.

Update 2:

(?<=is \()(.*?)(?=\s*\))

is matching your example "This is (a simple) sentence". See here on Regexr

like image 97
stema Avatar answered Oct 25 '22 02:10

stema


Lazy Quantifier Needed

Resurrecting this question because the regex in the accepted answer doesn't seem quite correct to me. Why? Because

(?<=This is)(.*)(?=sentence)

will match my first sentence. This is my second in This is my first sentence. This is my second sentence.

See demo.

You need a lazy quantifier between the two lookarounds. Adding a ? makes the star lazy.

This matches what you want:

(?<=This is).*?(?=sentence)

See demo. I removed the capture group, which was not needed.

DOTALL Mode to Match Across Line Breaks

Note that in the demo the "dot matches line breaks mode" (a.k.a.) dot-all is set (see how to turn on DOTALL in various languages). In many regex flavors, you can set it with the online modifier (?s), turning the expression into:

(?s)(?<=This is).*?(?=sentence)

Reference

  • The Many Degrees of Regex Greed
  • Repetition with Star and Plus
like image 20
zx81 Avatar answered Oct 25 '22 04:10

zx81


Try This is[\s\S]*?sentence, works in javascript

like image 22
kaore Avatar answered Oct 25 '22 03:10

kaore