Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

regex matching an open and close tag and a certain text patterns inside that tag [duplicate]

Tags:

regex

xml

Here is a sample custom tag i have from a sitemap.xml

<url>
  <loc>http://sitename.com/programming/php/?C=D;O=A</loc>
  <changefreq>weekly</changefreq>
  <priority>0.64</priority>
</url>

There are many entries like this and if you see loc tag it has c=d;0=a at the end. I want to remove all entries starting with <url> ending with </url> which contains C=D;0=A or similar patterns like that.

The following expression matched the whole of the above specified tag

<url>(.|\r\n)*?<\/url>

but I want to match like what i had specified in the above statement.

How do we form regex to match such conditions(patterns) ?

like image 495
Jayapal Chandran Avatar asked Jun 16 '11 08:06

Jayapal Chandran


People also ask

How to match in regex pattern?

To match a character having special meaning in regex, you need to use a escape sequence prefix with a backslash ( \ ). E.g., \. matches "." ; regex \+ matches "+" ; and regex \( matches "(" . You also need to use regex \\ to match "\" (back-slash).

What is\\ in regex?

To match a literal space, you'll need to escape it: "\\ " . This is a useful way of describing complex regular expressions: phone <- regex(" \\(? #

What is regex pattern?

A regex pattern matches a target string. The pattern is composed of a sequence of atoms. An atom is a single point within the regex pattern which it tries to match to the target string. The simplest atom is a literal, but grouping parts of the pattern to match an atom will require using ( ) as metacharacters.

What are word characters in regex?

A word character is a character a-z, A-Z, 0-9, including _ (underscore).


1 Answers

Try this:

/<url>(?:(?!<\/url>).)*C=D;O=A.*?<\/url>/m

The negative lookahead guaranties that you do not match multiple nodes.

See here: rubular

like image 196
morja Avatar answered Sep 23 '22 09:09

morja