Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extract all links from a string

I have a javascript variable containing the HTML source code of a page (not the source of the current page), I need to extract all links from this variable. Any clues as to what's the best way of doing this?

Is it possible to create a DOM for the HTML in the variable and then walk that?

like image 901
Hinchy Avatar asked Sep 28 '09 15:09

Hinchy


2 Answers

I don't know if this is the recommended way, but it works: (JavaScript only)

var rawHTML = '<html><body><a href="foo">bar</a><a href="narf">zort</a></body></html>';

var doc = document.createElement("html");
doc.innerHTML = rawHTML;
var links = doc.getElementsByTagName("a")
var urls = [];

for (var i=0; i<links.length; i++) {
    urls.push(links[i].getAttribute("href"));
}
alert(urls)
like image 91
andre-r Avatar answered Sep 30 '22 15:09

andre-r


If you're using jQuery, you can really easily I believe:

var doc = $(rawHTML);
var links = $('a', doc);

http://docs.jquery.com/Core/jQuery#htmlownerDocument

like image 34
brianreavis Avatar answered Sep 30 '22 15:09

brianreavis