Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regex: how to get contents from tag inner (use javascript)?

page contents:

aa<b>1;2'3</b>hh<b>aaa</b>..
 .<b>bbb</b>
blabla..

i want to get result:

1;2'3aaabbb

match tag is <b> and </b>

how to write this regex using javascript? thanks!

like image 647
Koerr Avatar asked Apr 12 '10 14:04

Koerr


2 Answers

Lazyanno,

If and only if:

  1. you have read SLaks's post (as well as the previous article he links to), and
  2. you fully understand the numerous and wondrous ways in which extracting information from HTML using regular expressions can break, and
  3. you are confident that none of the concerns apply in your case (e.g. you can guarantee that your input will never contain nested, mismatched etc. <b>/</b> tags or occurrences of <b> or </b> within <script>...</script> or comment <!-- .. --> tags, etc.)
  4. you absolutely and positively want to proceed with regular expression extraction

...then use:

var str = "aa<b>1;2'3</b>hh<b>aaa</b>..\n.<b>bbb</b>\nblabla..";

var match, result = "", regex = /<b>(.*?)<\/b>/ig;
while (match = regex.exec(str)) { result += match[1]; }

alert(result);

Produces:

1;2'3aaabbb
like image 164
vladr Avatar answered Sep 27 '22 21:09

vladr


You cannot parse HTML using regular expressions.

Instead, you should use Javascript's DOM.

For example (using jQuery):

var text = "";
$('<div>' + htmlSource + '</div>')
    .find('b')
    .each(function() { text += $(this).text(); });

I wrap the HTML in a <div> tag to find both nested and non-nested <b> elements.

like image 34
SLaks Avatar answered Sep 27 '22 23:09

SLaks