Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regular expression for syntax highlighting attributes in HTML tag

I'm working on regular expressions for some syntax highlighting in a Sublime/TextMate language file, and it requires that I "begin" on a non-self closing html tag, and end on the respective closing tag:

  • begin: (<)([a-zA-Z0-9:.]+)[^/>]*(>)

  • end: (</)(\2)([^>]*>)

So far, so good, I'm able to capture the tag name, and it matches to be able to apply the appropriate patterns for the area between the tags.

jsx-tag-area:
    begin: (<)([a-zA-Z0-9:.]+)[^/>]*>
    beginCaptures:
      '1': {name: punctuation.definition.tag.begin.jsx}
      '2': {name: entity.name.tag.jsx}
    end: (</)(\2)([^>]*>)
    endCaptures:
      '1': {name: punctuation.definition.tag.begin.jsx}
      '2': {name: entity.name.tag.jsx}
      '3': {name: punctuation.definition.tag.end.jsx}
    name: jsx.tag-area.jsx
    patterns:
    - {include: '#jsx'}
    - {include: '#jsx-evaluated-code'}

Now I'm also looking to also be able to capture zero or more of the html attributes in the opening tag to be able to highlight them.

So if the tag were <div attr="Something" data-attr="test" data-foo>

It would be able to match on attr, data-attr, and data-foo, as well as the < and div

Something like (this is very rough):

(<)([a-zA-Z0-9:.]+)(?:\s(?:([0-9a-zA-Z_-]*=?))\s?)*)[^/>]*(>)

It doesn't need to be perfect, it's just for some syntax highlighting, but I was having a hard time figuring out how to achieve multiple capture groups within the tag, whether I should be using look-around, etc, or whether this is even possible with a single expression.

Edit: here are more details about the specific case / question - https://github.com/reactjs/sublime-react/issues/18

like image 838
tgriesser Avatar asked Aug 04 '14 14:08

tgriesser


1 Answers

I may found a possible solution.

It is not perfect because as @skamazin said in the comments if you are trying to capture an arbitrary amount of attributes you will have to repeat the pattern that matches the attributes as many times as you want to limit the number of attributes you will allow.

The regex is pretty scary but it may work for your goal. Maybe it would be possible to simplify it a bit or maybe you will have to adjust some things

For only one attribute it will be as this:

(<)([a-zA-Z0-9:.]+)(?:(?: ((?<= )[^ ]+?(?==| |>)))(?:=[^ >]+)(?: |>))

DEMO

For more attributes you will need to add this as many times as you want:

(?:(?:((?<= )[^ ]+?(?==| |>)))(?:=[^ >]+)(?: |>))?

So for example if you want to allow maximum 3 attributes your regex will be like this:

(<)([a-zA-Z0-9:.]+)(?:(?: ((?<= )[^ ]+?(?==| |>)))(?:=[^ >]+)(?: |>))(?:(?:((?<= )[^ ]+?(?==| |>)))(?:=[^ >]+)?(?: |>))?(?:(?:((?<= )[^ ]+?(?==| |>)))(?:=[^ >]+)?(?: |>))?

DEMO

Tell me if it suits you and if you need further details.

like image 85
Oscar Hermosilla Avatar answered Nov 09 '22 09:11

Oscar Hermosilla