I'm aware that Python 3 fixes a lot of UTF issues, I am not however able to use Python 3, I am using 2.5.1 I'm trying to regex a document but the document has UTF hyphens in it – rather than -. Python can't match these and if I put them in the regex it throws a wobbly. How can I force Python to use a UTF string or in some way match a character such as that? Thanks for your help

You have to escape the character in question (–) and put a u in front of the string literal to make it a unicode string. So, for example, this: <pre class="prettyprint"><code>re.compile("–") </code></pre> becomes this: <pre class="prettyprint"><code>re.compile(u"\u2013") </code></pre>

UTF in Python Regex

1 Answers

You have to escape the character in question (–) and put a u in front of the string literal to make it a unicode string.

So, for example, this:

re.compile("–")

becomes this:

re.compile(u"\u2013")

191

answered Oct 17 '22 08:10

Patrick McElhaney

Related questions
                            
                                Python3 : module 'tabula' has no attribute 'read_pdf'
                            
                                How do you model something-over-time in Python?
                            
                                Unable to import pandas (pandas._libs.window.aggregations)
                            
                                Pyenv's python is missing bzip2 module
                            
                                Plotly: Figure window doesn't appear using Spyder
                            
                                Unavailable to install Tensorflow 1.x on Ubuntu 20.04 LTS using pip
                            
                                Renaming months from number to name in pandas
                            
                                What's the best way to parse through a list of strings and return joined strings based on slices of these strings?
                            
                                Google translate api timeout
                            
                                Why PyTorch model takes multiple image size inside the model?
                            
                                How to create a Python 3.8 virtual environment in Ubuntu 16.04
                            
                                How to fix 'numpy.ndarray' object has no attribute 'get_figure' when plotting subplots
                            
                                pip install options unclear
                            
                                how to delete char after -> without using a regular expression
                            
                                How do I get the discord.py intents to work?
                            
                                Windows keeps crashing when trying to install PyTorch via pip
                            
                                ImportError: Can't find framework /System/Library/Frameworks/OpenGL.framework
                            
                                Join two DataFrames on common columns only if the difference in a separate column is within range [-n, +n]
                            
                                Why does python's Exception's repr keep track of passed object's to __init__?
                            
                                How to "unroll" time intervals in a dataframe?

UTF in Python Regex

Tags:

python

regex

Teifion

People also ask

1 Answers

Patrick McElhaney

Recent Activity

Donate For Us