Trying to figure out a Regular Expression gives me a brain cramp :) I'm replacing thousands of individual <code>href</code>links with an individual shortcode in WordPress post content using a plugin that allows me to run regular expressions on content. Rather than try and combine an SQL query with a RegEx, I'm doing it in two stages: first the SQL to find/replace each individual URL to the individual shortcode, and the second stage, remove the rest of the 'href` link markup. These are some examples of what I have now from the first step; as you can see, the URL has been replaced with the <code>[nggallery id=xxx]</code> shortcode. <pre class="prettyprint"><code><a href="[nggallery id=xx]"> <img class="alignnone size-large wp-image-23067" title="Image Title" src="http://example.com/wp-content/uploads/2015/06/image-title.jpg" alt="" width="685" height="456" /></a> <a href="[nggallery id=xxxxx]">Click here!</a> <a title="title title" href="[nggallery id=xxx]" target="_blank">Title Link Title Link</a> </code></pre> Now, I need to delete all the <code>href</code> link markup - <code>span</code>, <code>img</code>, etc - in between the leading <code><a</code> and ending <code></a></code>, leaving just the shortcode <code>[nggallery id=xxx]</code>. I've got a start here: https://www.regex101.com/r/rL8wP1/2 But I don't know how to prevent the <code>[nggallery id=xxx]</code> shortcode from being captured in the RegEx. Update 7/09/2015 @nhahtdh's answer appears to work perfectly, is not too greedy, and doesn't eat adjacent html links. Use <code>(</code> and <code>)</code> as delimiters and <code>$1</code> as a replacement with a regex plugin in WordPress. (If using BBEdit, you will need to use <code>\1</code>) <pre class="prettyprint"><code>( <a\s[^>]*"(\[nggallery[^\]]*\])".*?<\/a> ) </code></pre> Update 7/02/2015 Thanks to Fab Sa (answer below), his regex at https://www.regex101.com/r/rL8wP1/4 <pre class="prettyprint"><code><a.*(\[nggallery[^\]+]*\]).*?<\/a> </code></pre> works in the regex101 emulator, but when used in the BBEdit text editor or the WordPress plugin that runs regex, his regex deletes the <code>[nggallery id=***]</code> shortcode. So is it too greedy? Some other issue? Update 7/01/2015: I know, I know, re: RegEx match open tags except XHTML self-contained tags YOU CANNOT PARSE HTML WITH REGEX

You can use this regex <pre class="prettyprint"><code><a.*(\[nggallery[^\]+]*\]).*?<\/a> </code></pre> globally (flag g). This regex will match a link and save the <code>[nggallery ...]</code> part. You can substitue the all match with $1 to keep the saved <code>[nggallery ...]</code> part. I've updated your regex online: https://www.regex101.com/r/rL8wP1/4 PS: In this solution <code>[nggallery ...]</code> don't need to be in a specific attribut like href. If you want to force that, you can use <code><a.*href\="(\[nggallery[^\]+]*\])".*?<\/a></code>

Fab Sa's regex <code><a.*(\[nggallery[^\]+]*\]).*?<\/a></code> gobbles up everything when there are multiple <code><a></code> tags on a single line, due to the unrestricted <code>.*</code> at the beginning, which will match across different <code><a></code> tags. By restricting the allowable characters, you can somewhat match what you want: <pre class="prettyprint"><code><a\s[^>]*"(\[nggallery[^\]]*\])".*?<\/a> ^^^^^^^ </code></pre> I forced at least one whitespace after <code>a</code> to make sure that it's not matching some other tags, plus some extra restrictions. Anyway, you are on your own if you discover that it doesn't work in some corner case. It's generally a bad idea to manipulate HTML with regex.

RegEx to remove all markup between <a and </a> tags except for within [ and ]

Tags:

html

regex

html-parsing

Trying to figure out a Regular Expression gives me a brain cramp :)

I'm replacing thousands of individual hreflinks with an individual shortcode in WordPress post content using a plugin that allows me to run regular expressions on content.

Rather than try and combine an SQL query with a RegEx, I'm doing it in two stages: first the SQL to find/replace each individual URL to the individual shortcode, and the second stage, remove the rest of the 'href` link markup.

These are some examples of what I have now from the first step; as you can see, the URL has been replaced with the [nggallery id=xxx] shortcode.

<a href="[nggallery id=xx]"><span class="shutterset">
<img class="alignnone size-large wp-image-23067" title="Image Title" 
src="http://example.com/wp-content/uploads/2015/06/image-title.jpg"
alt="" width="685" height="456" /></span></a>

<a href="[nggallery id=xxxxx]">Click here!</a>

<a title="title title" href="[nggallery id=xxx]" target="_blank">Title Link Title Link</a>

Now, I need to delete all the href link markup - span, img, etc - in between the leading <a and ending </a>, leaving just the shortcode [nggallery id=xxx].

I've got a start here: https://www.regex101.com/r/rL8wP1/2

But I don't know how to prevent the [nggallery id=xxx] shortcode from being captured in the RegEx.

Update 7/09/2015

@nhahtdh's answer appears to work perfectly, is not too greedy, and doesn't eat adjacent html links. Use ( and ) as delimiters and $1 as a replacement with a regex plugin in WordPress. (If using BBEdit, you will need to use \1)

( <a\s[^>]*"(\[nggallery[^\]]*\])".*?<\/a> )

Update 7/02/2015

Thanks to Fab Sa (answer below), his regex at https://www.regex101.com/r/rL8wP1/4

<a.*(\[nggallery[^\]+]*\]).*?<\/a>

works in the regex101 emulator, but when used in the BBEdit text editor or the WordPress plugin that runs regex, his regex deletes the [nggallery id=***] shortcode. So is it too greedy? Some other issue?

Update 7/01/2015:

I know, I know, re: RegEx match open tags except XHTML self-contained tags YOU CANNOT PARSE HTML WITH REGEX

841

asked Jun 30 '15 19:06

markratledge

2 Answers

You can use this regex

<a.*(\[nggallery[^\]+]*\]).*?<\/a>

globally (flag g). This regex will match a link and save the [nggallery ...] part. You can substitue the all match with $1 to keep the saved [nggallery ...] part.

I've updated your regex online: https://www.regex101.com/r/rL8wP1/4

PS: In this solution [nggallery ...] don't need to be in a specific attribut like href. If you want to force that, you can use <a.*href\="(\[nggallery[^\]+]*\])".*?<\/a>

answered Oct 11 '22 14:10

Fabien Sa

Fab Sa's regex <a.*(\[nggallery[^\]+]*\]).*?<\/a> gobbles up everything when there are multiple <a> tags on a single line, due to the unrestricted .* at the beginning, which will match across different <a> tags.

By restricting the allowable characters, you can somewhat match what you want:

<a\s[^>]*"(\[nggallery[^\]]*\])".*?<\/a>
  ^^^^^^^

I forced at least one whitespace after a to make sure that it's not matching some other tags, plus some extra restrictions.

Anyway, you are on your own if you discover that it doesn't work in some corner case. It's generally a bad idea to manipulate HTML with regex.

answered Oct 11 '22 15:10

nhahtdh

Related questions
                            
                                jQuery clone div every time clicked
                            
                                Are global variables accessible in Angular 2 html template directly?
                            
                                Is there a way to set CSS properties like `display` using Cypress commands?
                            
                                returning JSON and HTML from PHP script
                            
                                HTML5 Canvas drawing multicolored lines
                            
                                JavaScript- find text within a page and jump to location in page
                            
                                Can you have multiple lines in an <option> element?
                            
                                CSS 100% height with absolute positioning top 0 bottom 0
                            
                                iPhone File Upload with HTML
                            
                                Sticky header & internal links
                            
                                Section, Article, or Div? and Smaller text in a section and article tag?
                            
                                Changing the direction of HTML <title> tag to right-to-left
                            
                                Should all accented characters use html entities?
                            
                                HTML5 rich-text inside textarea
                            
                                Problems with window.postMessage on Chrome
                            
                                Is it possible to have an :after pseudo element on a button? [duplicate]
                            
                                How to make <hr> full width?
                            
                                Emulating a mobile screen size through an iframe
                            
                                Is there syntax for a tr line specific CSS selector?
                            
                                Does UL have default margin or padding [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With