Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extract a substring between two words from a string

I have the following string:

string = "asflkjsdhlkjsdhglk<body>Iwant\to+extr@ctth!sstr|ng<body>sdgdfsghsghsgh"

I would like to extract the string between the two <body> tags. The result I am looking for is:

substring = "<body>Iwant\to+extr@ctth!sstr|ng<body>"

Note that the substring between the two <body> tags can contain letters, numbers, punctuation and special characters.

Is there an easy way of doing this? Thank you!

like image 840
Mayou Avatar asked Nov 26 '13 18:11

Mayou


2 Answers

Here is the regular expression way:

regmatches(string, regexpr('<body>.+<body>', string))
like image 86
Matthew Plourde Avatar answered Nov 05 '22 19:11

Matthew Plourde


regex = '<body>.+?<body>'

You want the non-greedy (.+?), so that it doesn't group as many <body> tags as possible.

If you're solely using a regex with no auxiliary functions, you're going to need a capturing group to extract what is required, ie:

regex = '(<body>.+?<body>)'
like image 31
Steve P. Avatar answered Nov 05 '22 19:11

Steve P.