Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

regex to find tag id and content JavaScript

Hey I'm trying to do something quite specific with regex in javascript and my regexp-foo is shakey at best. Wondered if there were any pros out there who could point me in the right direction. So I have some text...

<item id="myid1">myitem1</item>
<item id="myid2">myitem2</item>

...etc

And I would like to strip it out into an array that reads myid1, myitem1, myid2, myitem2, ....etc

There will never be nested elements so there is no recursive nesting problem. Anyone able to bash this out quickly? Thanks for your help!

like image 573
Thomas Avatar asked Jul 17 '10 10:07

Thomas


1 Answers

Here's a regex that will:

  • Match the starting and ending tag element names
  • Extract the value of the id attribute
  • Extract the inner html contents of the tag

Note: I am being lazy in matching the attribute value here. It needs to be enclosed in double quotes, and there needs to be no spaces between the attribute name and its value.

<([^\s]+).*?id="([^"]*?)".*?>(.+?)</\1>

Running the regex in javascript would be done like so:

search = '<item id="item1">firstItem</item><item id="item2">secondItem</item>';
regex = new RegExp(/<([^\s]+).*?id="([^"]*?)".*?>(.+?)<\/\1>/gi);
matches = search.match(regex);
results = {};
for (i in matches) {
    parts = regex.exec(matches[i]);
    results[parts[2]] = parts[3];
}

At the end of this, results would be an object that looks like:

{
    "item1": "firstItem",
    "item2": "secondItem"
}

YMMV if the <item> elements contain nested HTML.

like image 184
Chris Avatar answered Nov 09 '22 23:11

Chris