Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I combine these two regex patterns?

Tags:

regex

I'm feeling pretty silly having to ask this, but I cannot get this to work to save my life...

What Works

preg_replace( '/(<[^>]+) onmouseout=".*?"/i', '$1', preg_replace( '/(<[^>]+) onmouseover=".*?"/i', '$1', $strHtml ) )

How can I combine these two preg_replace functions into one (by combing the two regex patterns?

My Attempt to Cleanup (Doesn't Work)

preg_replace( '/(<[^>]+) (onmouseover|onmouseout)=".*?"/i', '$1', $strHtml )

I want this preg_replace() function to remove all onmouseover AND onmouseout attributes from my HTML string. It appears to remove only one of the two attributes... What am I doing wrong?

UPDATE: Example String

<p><img src="http://www.bestlinknetware.com/products/204233spc.jpg" width="680" height="365"><br>   <a href="http://www.bestlinknetware.com/products/204233INST.pdf" target="_blank" onmouseover="MM_swapImage('Image2','','/Content/bimages/ins2.gif',1)" onmouseout="MM_swapImgRestore()"><img name="Image2" border="0" src="http://www.bestlinknetware.com/Content/bimages/ins1.gif"></a> </p> <p><strong>No contract / No subscription / No monthy fee<br> 1080p HDTV reception<br> 32db high gain reception<br> Rotor let you change direction of the antenna to find best reception</strong></p>  <a href=http://transition.fcc.gov/mb/engineering/dtvmaps/  target="blank"><strong>CLICK HERE</strong></a><br>to see HDTV channels available in your area.<br> <br/> ** TV signal reception is immensely affected by the conditions such as antenna height, terrain, distance from broadcasting transmission antenna and output power of transmitter. Channels you can watch may vary depending on these conditions. <br> <br/> <br/> <p>* Reception: VHF/UHF/FM<br/>   * Reception range: 120miles<br/>   * Built-in 360 degree motor rotor<br>   * Wireless remote controller for rotor (included)<br/>   * Dual TV Outputs<br>   * Easy Installation<br>   * High Sensitivity Reception<br>   * Built-in Super Low Noise Amplifier<br>   * Power : AC15V 300mA<br> <br/> Kit contents<br/> * One - HDTV Yagi antenna with built-in roter & amplifier<br/> * One - Roter control box<br/> * One - Remote for roter control box<br/> * One - 40Ft coax cable<br/> * One - 4Ft coax cable<br/> * One - power supply for roter control box</p>

UPDATE: Tool for Future Views of This Thread

https://regex101.com/

I could never figure out exactly how to use http://regexr.com/, so I tried this regex101.com site, and I have been loving it ever since. Highly recommended for anyone facing similar issues (that used a cut-and-paste regex pattern like I did originally...).

like image 776
derekmx271 Avatar asked Nov 08 '22 22:11

derekmx271


1 Answers

The problem with your original expression was that the initial group was grabbing too much and so the only one of the two being replaced was the one appearing last on the line. That happened because of the greedy [^>]+ repetition that ate up a larger portion of the search string than you were anticipating, capturing everything from the beginning of the first desired match through to the start second attribute you wanted to get rid of. And then having the pattern anchor to the starting bracket of an html tag would also prevent multiple matches within the element even after addressing that issue.

If you want to do this in one call to preg_replace() then rather than trying to grab the text that you want to keep it makes more sense to look for text to remove (by substitution with an empty string):

preg_replace( '/(onmouseover|onmouseout)=".*?"/i', '', $strHtml )

You already had a non-greedy match on the attribute value (with the .*?) and based on your prior code it appears to have been working well for you already. Note that this particular expression doesn't cover all the possible variations in an HTML/XML document (whitespace and quote marks, for example.) I trust that you can make a judgment call regarding whether this is generic enough for your needs.

like image 172
shawnt00 Avatar answered Nov 26 '22 14:11

shawnt00