Here is a sample custom tag i have from a sitemap.xml
<url>
<loc>http://sitename.com/programming/php/?C=D;O=A</loc>
<changefreq>weekly</changefreq>
<priority>0.64</priority>
</url>
There are many entries like this and if you see loc tag it has c=d;0=a at the end.
I want to remove all entries starting with <url>
ending with </url>
which contains C=D;0=A or similar patterns like that.
The following expression matched the whole of the above specified tag
<url>(.|\r\n)*?<\/url>
but I want to match like what i had specified in the above statement.
How do we form regex to match such conditions(patterns) ?
To match a character having special meaning in regex, you need to use a escape sequence prefix with a backslash ( \ ). E.g., \. matches "." ; regex \+ matches "+" ; and regex \( matches "(" . You also need to use regex \\ to match "\" (back-slash).
To match a literal space, you'll need to escape it: "\\ " . This is a useful way of describing complex regular expressions: phone <- regex(" \\(? #
A regex pattern matches a target string. The pattern is composed of a sequence of atoms. An atom is a single point within the regex pattern which it tries to match to the target string. The simplest atom is a literal, but grouping parts of the pattern to match an atom will require using ( ) as metacharacters.
A word character is a character a-z, A-Z, 0-9, including _ (underscore).
Try this:
/<url>(?:(?!<\/url>).)*C=D;O=A.*?<\/url>/m
The negative lookahead guaranties that you do not match multiple nodes.
See here: rubular
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With