Python XPath parsing tag with apostrophe

Question

I'm new to XPath. I'm trying to parse a page using XPath. I need to get information from tag, but escaped apostrophe in title screws up everything.

For parsing i use Grab.

tag from source:

<img src='somelink' border='0' alt='commission:Alfred\'s misadventures' title='commission:Alfred\'s misadventures'>

Actual XPath:

g.xpath('.//tr/td/a[3]/img').get('title')

Returns

commission:Alfred\

Is there any way to fix this?

Thanks

Wayne · Accepted Answer

Garbage in, garbage out. Your input is not well-formed, because it improperly escapes the single quote character. Many programming languages (including Python) use the backslash character to escape quotes in string literals. XML does not. You should either 1) surround the attribute's value with double-quotes; or 2) use ' to include a single quote.

From the XML spec:

To allow attribute values to contain both single and double quotes, the apostrophe or single-quote character (') may be represented as " ' ", and the double-quote character (") as " " ".

From the XML spec:

To allow attribute values to contain both single and double quotes, the apostrophe or single-quote character (') may be represented as " ' ", and the double-quote character (") as " " ".

Dimitre Novatchev · Answer

As the provided "XML" isn't a wellformed document due to nested apostrophes, no XPath expression can be evaluated on it.

The provided non-well-formed text can be corrected to:

<img src="somelink"
 border="0"
 alt="commission:Alfred's misadventures"
 title="commission:Alfred's misadventures"/>

In case there is a weird requiremend not to use quotes, then one correct convertion is:

<img src='somelink'
 border='0'
 alt='commission:Alfred&apos;s misadventures'
 title='commission:Alfred&apos;s misadventures'/>

If you are provided the incorrect input, in a language such as C# one can try to convert it to its correct counterpart using:

string correctXml = input.replace("\'s", "&apos;s")

Probably there is a similar way to do the same in Python.

Python XPath parsing tag with apostrophe

Tags:

python

parsing

xpath

apostrophe

Stanislav Golovanov

2 Answers

Wayne

Dimitre Novatchev

Recent Activity

Donate For Us

Python XPath parsing tag with apostrophe

Tags:

python

parsing

xpath

apostrophe

Stanislav Golovanov

2 Answers

Wayne

Dimitre Novatchev

Related questions

Recent Activity

Donate For Us