Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex: Remove empty-element tags for xml

Tags:

regex

xml

I'd like to replace all self-closed elements to the long syntax (because my web-browser is tripping on them).

Example

<iframe src="http://example.com/thing"/>

becomes

<iframe src="http://example.com/thing"></iframe>

I'm using python's flavor of regex.

like image 929
Paul Tarjan Avatar asked Jun 27 '26 01:06

Paul Tarjan


2 Answers

None of those solutions will accommodate attributes like foo="/>". Try:

s:<([\w\-_]+)((?:[^'">]|'[^']*'|"[^"]*")*)/\s*>:<$1$2></$1>:

Exploded to show detail:

<
    ([\w\-_]+)    # tag name
    (
        [^'">]*| # "normal" characters, or
        '[^']*'| # single-quoted string, or
        "[^"]*"  # double-quotes string
    )*
    /\s*         # self-closing
>

This should always work provided that the markup is valid. (You could rearrange this using lazy quantifiers if you so chose; e.g. '[^']' => '.*?'.)

like image 118
Thom Smith Avatar answered Jun 30 '26 07:06

Thom Smith


Use this python regex:

(<(\w+)[^<]*?)/>

It differs from @Kinopiko's in that it will handle nested elements.

Explanation of Regex

  1. Find the opening bracket: <
  2. Find the word following: (\w+)
  3. Find any and all tags between the tag name and its closing bracket except for another open bracket to handle nested tags: [^<]*?
  4. Find the closing tag: >

Then just replace with this statement:

\1></\2>
like image 31
Michael La Voie Avatar answered Jun 30 '26 08:06

Michael La Voie



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!