Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What am I doing wrong in parsing this regular expression in javascript?

My string is:

<div> (blah blah blah) ---> quite big HTML before coming to this line.<b>Train No. &amp; Name : </b></td><td style="border-bottom:1px solid #ccc;font:12px arial"><span>12672 / SOUTH TRUNK EXP</span></td>

I managed to formulate a regular expression

var trainDetails = new RegExp("<b>Train No. &amp; Name : </b></td><td.*>([0-9][a-z][A-Z]+)</span></td>", "m");

But trainDetails are null or are empty.

All I am trying to do is to get the train name and the train number within the span element.

Any pointers where I am doing wrong ?

like image 759
now he who must not be named. Avatar asked Dec 30 '15 10:12

now he who must not be named.


2 Answers

It worked for me:

Using RegExp

string = '<div> (blah blah blah) ---> quite big HTML before coming to this line.<b>Train No. &amp; Name : </b></td><td style="border-bottom:1px solid #ccc;font:12px arial"><span>12672 / SOUTH TRUNK EXP</span></td>';

var trainDetail = string.replace( new RegExp(".*?([^\>]+)(?:\<\/[A-z]+\>)+$","g"), '$1');

Using DOM

string = ('<b>Train No. &amp; Name : </b></td><td style="border-bottom:1px solid #ccc;font:12px arial"><span>12672 / SOUTH TRUNK EXP</span></td>');
string = string.replace(new RegExp( '(<\/?)td', 'g'), '$1xmltd');
tempDoc = document.createElement('xml');
tempDoc.innerHTML = string;
node = tempDoc.getElementsByTagName('xmltd');
trainDetails = node[node.length-1].textContent;

Assume condition that last "<td>" in string has train detail.

like image 162
Vegeta Avatar answered Nov 03 '22 00:11

Vegeta


Regular expression is not the ideal solution for this use-case. I suggest using your browser's builtin HTML parser to get the inner HTML of the <span>.

var el = document.createElement('html');
el.innerHTML = '<div> (blah blah blah) ---> quite big HTML before coming to this line.<b>Train No. &amp; Name : </b></td><td style="border-bottom:1px solid #ccc;font:12px arial"><span>12672 / SOUTH TRUNK EXP</span></td>';
var output = el.getElementsByTagName('span')[0].innerHTML;

The value of the output variable becomes:

12672 / SOUTH TRUNK EXP

Edit

If you are interested in a specific <span>, I suggest adding a class to its tag or its parent <td> tag, e.g.:

<span class="train-number-and-name">
   12672 / SOUTH TRUNK EXP
</span>

And fetch it like this:

var output = el.querySelector('span.train-number-and-name').innerHTML;
like image 23
sina Avatar answered Nov 03 '22 00:11

sina