Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to properly escape single and double quotes

Tags:

python

lxml

I have a lxml etree HTMLParser object that I'm trying to build xpaths with to assert xpaths, attributes of the xpath and text of that tag. I ran into a problem when the text of the tag has either single-quotes(') or double-quotes(") and I've exhausted all my options.

Here's a sample object I created

parser = etree.HTMLParser()
tree = etree.parse(StringIO(<html><body><p align="center">Here is my 'test' "string"</p></body></html>), parser)

Here is the snippet of code and then different variations of the variable being read in

   def getXpath(self)
     xpath += 'starts-with(., \'' + self.text + '\') and '
     xpath += ('count(@*)=' + str(attrsCount) if self.exactMatch else "1=1") + ']'

self.text is basically the expected text of the tag, in this case: Here is my 'test' "string"

this fails when i try to use the xpath method of the HTMLParser object

tree.xpath(self.getXpath())

Reason is because the xpath that it gets is this '/html/body/p[starts-with(.,'Here is my 'test' "string"') and 1=1]'

How can I properly escape the single and double quotes from the self.text variable? I've tried triple quoting, wrapping self.text in repr(), or doing a re.sub or string.replace escaping ' and " with \' and \"

like image 841
Bob Evans Avatar asked Oct 18 '11 04:10

Bob Evans


People also ask

How do you escape a single quote in a double quote?

You need to escape a single quote when the literal is enclosed in a single code using the backslash(\) or need to escape double quotes when the literal is enclosed in a double code using a backslash(\).

How do you escape a single quote from a single quote?

No escaping is used with single quotes. Use a double backslash as the escape character for backslash.

How do you escape quotes within a quote?

A single quoted string can't contain another single quote inside the string. You can do this task by adding backslash in the front of single quote. In the following example, single quote of don't word is printed by using backslash.

How do you escape double quotes in double quotes?

If you use single quotes to create a string, you can not use single quotes within that string without escaping them using a backslash ( \ ). The same theory applies to double quotes, and you have to use a backslash to escape any double quotes inside double quotes.


1 Answers

According to what we can see in Wikipedia and w3 school, you should not have ' and " in nodes content, even if only < and & are said to be stricly illegal. They should be replaced by corresponding "predefined entity references", that are &apos; and &quot;.

By the way, the Python parsers I use will take care of this transparently: when writing, they are replaced; when reading, they are converted.

After a second reading of your answer, I tested some stuff with the ' and so on in Python interpreter. And it will escape everything for you!

>>> 'text {0}'.format('blabla "some" bla')
'text blabla "some" bla'
>>> 'ntsnts {0}'.format("ontsi'tns")
"ntsnts ontsi'tns"
>>> 'ntsnts {0}'.format("ontsi'tn' \"ntsis")
'ntsnts ontsi\'tn\' "ntsis'

So we can see that Python escapes things correctly. Could you then copy-paste the error message you get (if any)?

like image 68
Joël Avatar answered Nov 04 '22 20:11

Joël