Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to extract links and titles from a .html page?

for my website, i'd like to add a new functionality.

I would like user to be able to upload his bookmarks backup file (from any browser if possible) so I can upload it to their profile and they don't have to insert all of them manually...

the only part i'm missing to do this it's the part of extracting title and URL from the uploaded file.. can anyone give a clue where to start or where to read?

used search option and (How to extract data from a raw HTML file?) this is the most related question for mine and it doesn't talk about it..

I really don't mind if its using jquery or php

Thank you very much.

like image 218
Toni Michel Caubet Avatar asked Dec 12 '10 18:12

Toni Michel Caubet


People also ask

How do I extract text from a link?

Click and drag to select the text on the Web page you want to extract and press “Ctrl-C” to copy the text. Open a text editor or document program and press “Ctrl-V” to paste the text from the Web page into the text file or document window. Save the text file or document to your computer.

How do I find the HTML URL?

Answer: Use the window. location. href Property location. href property to get the entire URL of the current page which includes host name, query string, fragment identifier, etc. The following example will display the current url of the page on click of the button.


1 Answers

Thank you everyone, I GOT IT!

The final code:

$html = file_get_contents('bookmarks.html'); //Create a new DOM document $dom = new DOMDocument;  //Parse the HTML. The @ is used to suppress any parsing errors //that will be thrown if the $html string isn't valid XHTML. @$dom->loadHTML($html);  //Get all links. You could also use any other tag name here, //like 'img' or 'table', to extract other tags. $links = $dom->getElementsByTagName('a');  //Iterate over the extracted links and display their URLs foreach ($links as $link){     //Extract and show the "href" attribute.     echo $link->nodeValue;     echo $link->getAttribute('href'), '<br>'; } 

This shows you the anchor text assigned and the href for all links in a .html file.

Again, thanks a lot.

like image 73
Toni Michel Caubet Avatar answered Sep 22 '22 15:09

Toni Michel Caubet