Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

regex that strip unclosed <

Tags:

regex

php

I'm looking for a regex to use in php (maybe with preg replace?) that strips in a text all unclosed < and ONLY unclosed and all the unopened > and ONLY the unopened.

Some examples:

1

<name> aaaaaa bbbbb <  aagfetfe <aaaa/>
to
<name> aaaaaa bbbbb   aagfetfe <aaaa/>

2

<<1111>sbab  < amkka <pippo>
to
<1111>sbab   amkka <pippo>

3

<1111> aaaa <    thehehe  > aaaaaa <ciao>
to
<1111> aaaa <    thehehe  > aaaaaa <ciao>

4

<1111> aaaa   thehehe  > aaaaaa <ciao>
to 
<1111> aaaa   thehehe   aaaaaa <ciao>

5

<1111> aaaa   thehehe  < aaaaaa
to 
<1111> aaaa   thehehe   aaaaaa

I really cant do it its too difficult for me.

like image 563
user1237899 Avatar asked Feb 28 '12 12:02

user1237899


2 Answers

$s = preg_replace("/<([^<>]*)(?=<|$)/", "$1", $s); # remove unclosed '<'
$s = preg_replace("/(^|(?<=>))([^<>]*)>/", "$1", $s); # remove unopened '>'

Do you understand why?

like image 142
ruakh Avatar answered Oct 02 '22 18:10

ruakh


For unclosed <, you can replace <(?=[^>]*(<|$)) by an empty string. It matches all < which are not followed by a closing > before the next < or the end of the line. "not followed by" is a positive lookahead.

For unopened >, you can replace ((^|>)[^<]*)> by $1. It matches text which starts with an > (or the line start), does not contain < and ends with a >. $1 represents everything except the last >.

like image 42
Heinzi Avatar answered Oct 02 '22 18:10

Heinzi