Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python raw strings and unicode : how to use Web input as regexp patterns?

EDIT : This question doesn't really make sense once you have picked up what the "r" flag means. More details here. For people looking for a quick anwser, I added on below.

If I enter a regexp manually in a Python script, I can use 4 combinations of flags for my pattern strings :

  • p1 = "pattern"
  • p2 = u"pattern"
  • p3 = r"pattern"
  • p4 = ru"pattern"

I have a bunch a unicode strings coming from a Web form input and want to use them as regexp patterns.

I want to know what process I should apply to the strings so I can expect similar result from the usage of the manual form above. Something like :

import re
assert re.match(p1, some_text) == re.match(someProcess1(web_input), some_text)
assert re.match(p2, some_text) == re.match(someProcess2(web_input), some_text)
assert re.match(p3, some_text) == re.match(someProcess3(web_input), some_text)
assert re.match(p4, some_text) == re.match(someProcess4(web_input), some_text)

What would be someProcess1 to someProcessN and why ?

I suppose that someProcess2 doesn't need to do anything while someProcess1 should do some unicode conversion to the local encoding. For the raw string literals, I am clueless.

like image 727
e-satis Avatar asked Dec 23 '22 06:12

e-satis


1 Answers

Apart from possibly having to encode Unicode properly (in Python 2.*), no processing is needed because there is no specific type for "raw strings" -- it's just a syntax for literals, i.e. for string constants, and you don't have any string constants in your code snippet, so there's nothing to "process".

like image 155
Alex Martelli Avatar answered Apr 28 '23 03:04

Alex Martelli