Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the safest way to extract <title> from an HTML file using xpath?

Tags:

html

xpath

Here is my current xpath code "/html/head/title".

But you know, in the real world html environment, the code format usually broken, e.g. <html> tag is missing could cause an exception. So, I would like to know if there's a safe way to extract the <title> tag? (something like getElementByTagName)

like image 430
silent Avatar asked Aug 18 '10 01:08

silent


People also ask

Can I use XPath on HTML?

XML and HTML Note that HTML and XML have a very similar structure, which is why XPath can be used almost interchangeably to navigate both HTML and XML documents.

How XPath is useful for analysis of HTML data?

You can use XPaths to describe where the elements are located on an HTML page. XPath is specially useful when the HTML code of a page is rather complex. You don't necessarily need to learn XPaths to get the most out of ImportFromWeb.

What is XPath in scraping?

Xpath is a way to write a pattern that can be matched to a document structure for scraping data. It specifies the parts of a document in a tree structure manner where the parent node is written before the child node inside a pattern.

How do I get inner HTML using XPath?

How do I get inner HTML using XPath? you can use XPath to select just content in one of two ways: Text Node Selection. This XPath, //div[@class='myclass']/text() will select the text node children of the targeted div element, content , as requested.


2 Answers

"//title" perhaps?

like image 150
meder omuraliev Avatar answered Sep 28 '22 03:09

meder omuraliev


Because of the unruly nature of html markup you should use an html parsing library. You didn't specify a platform or language but there are a number of open source libraries out there.

like image 44
Paul Sasik Avatar answered Sep 28 '22 03:09

Paul Sasik