Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

REGEXP_REPLACE guidance

I've been trying to mass remove spammy links from posts on Wordpress on bulk like the following:

<a style="text-decoration: none" href="/price-of-xenical-at-pharmacy">.</a>

They reside in the wp_posts table under the post_content column. I was attempting to do it via a wild card of adding % within the href tag, because all the URL's are different but the anchor ( full stop ) and the inline styling is the same.

UPDATE wp_posts
SET post_content = REPLACE (post_content,
    '<a style="text-decoration:none" href="%">.</a>',
    '.');

I have since been told that SQL doesn't support what I'm trying to do ( or at least the way I'm doing it ).

I'm using MariaDB which apparently supports REGEXP_REPLACE, so I'm looking for some guidance on what SQL Query and REGEX I would need to mass remove these links but leave all the other content intact.

Any help greatly appreciated, The Aim is to remove the above strings, or replace with a blank space

UPDATE

Example post content, last link being the type I need to remove. :

    <h2>Warranty</h2>
<span style="font-size: small"> </span>

<span style="font-size: small">Lorem ipsum dolor sit amet, non risus bibendum quis morbi, duis elit porttitor semper, ante augue at consectetuer elit lectus est, nascetur neque consequuntur donec turpis. Cursus ullamcorper posuere massa interdum, rhoncus blandit, vitae in etiam justo lectus eu fames. Dolor quam dicta wisi class duis. Eleifend sagittis, scelerisque convallis consectetuer sed non aptent. Velit tristique vulputate proin, ipsum diam aliquam. Nibh sit vitae et m</span>

&nbsp;

<a href="https://www.example.com/wp-content/image.jpg"><img class="alignright size-full wp-image-56" title="image" src="https://www.example.com/wp-content/image.jpg" alt="image" width="280" height="280" /></a><a style="text-decoration: none" href="/price-of-xenical-at-pharmacy">.</a>
like image 807
Randomer11 Avatar asked Mar 07 '19 09:03

Randomer11


1 Answers

If you want to remove all anchor tags, but retain the text which was wrapped in the tags, then try using this pattern:

<a[^>]*>(.*?)</a>

Then, replace with just the first capture group. There is not much to say about the pattern except that we use (.*?) to capture the content in between the anchor tags. The .*? is significant, and tells the regex engine to stop at the first closing tag. Otherwise, if we just used (.*), it would potentially consume across multiple anchor tags, should they exist in your column.

SELECT
    REGEXP_REPLACE('<a style="text-decoration:none" href="[^"]*">BLAH</a>',
        '<a[^>]*>(.*?)</a>', '$1');

The above query outputs BLAH.

If you instead want to just strip off all anchor tags, then use this:

SELECT
    REGEXP_REPLACE('<a style="text-decoration:none" href="[^"]*">BLAH</a>',
        '<a[^>]*>(.*?)</a>', '');
like image 69
Tim Biegeleisen Avatar answered Oct 24 '22 05:10

Tim Biegeleisen