I have this line to remove all non-alphanumeric characters except spaces <pre class="prettyprint"><code>re.sub(r'\W+', '', s) </code></pre> Although, it still keeps non-English characters. For example if I have <pre class="prettyprint"><code>re.sub(r'\W+', '', 'This is a sentence, and here are non-english 托利苏 !!11') </code></pre> I want to get as output: <pre class="prettyprint"><code>> 'This is a sentence and here are non-english 11' </code></pre>

<pre class="prettyprint"><code>re.sub(r'[^A-Za-z0-9 ]+', '', s) </code></pre> (Edit) To clarify: The <code>[]</code> create a list of chars. The <code>^</code> negates the list. <code>A-Za-z</code> are the English alphabet and <code></code> is space. For any one or more of these (that is, anything that is not A-Z, a-z, or space,) replace with the empty string.

This might not be an answer to this concrete question but i came across this thread during my research. I wanted to reach the same objective as the questioner but I wanted to include non English characters such as: ä,ü,ß, ... The way the questioners code works, spaces will be deleted too. A simple workaround is the following: <pre class="prettyprint"><code>re.sub(r'[^ \w+]', '', string) </code></pre> The ^ implies that everything but the following is selected. In this case \w, thus every word character (including non-English), and spaces. I hope this will help someone in the future

Python - keep only alphanumeric and space, and ignore non-ASCII

Tags:

python

regex

I have this line to remove all non-alphanumeric characters except spaces

re.sub(r'\W+', '', s)

Although, it still keeps non-English characters.

For example if I have

re.sub(r'\W+', '', 'This is a sentence, and here are non-english 托利 苏 !!11')

I want to get as output:

> 'This is a sentence and here are non-english  11'

775

asked Apr 29 '19 11:04

Filipe

2 Answers

re.sub(r'[^A-Za-z0-9 ]+', '', s)

(Edit) To clarify: The [] create a list of chars. The ^ negates the list. A-Za-z are the English alphabet and is space. For any one or more of these (that is, anything that is not A-Z, a-z, or space,) replace with the empty string.

116

answered Dec 06 '22 03:12

Nir Levy

This might not be an answer to this concrete question but i came across this thread during my research.

I wanted to reach the same objective as the questioner but I wanted to include non English characters such as: ä,ü,ß, ...

The way the questioners code works, spaces will be deleted too.

A simple workaround is the following:

re.sub(r'[^ \w+]', '', string)

The ^ implies that everything but the following is selected. In this case \w, thus every word character (including non-English), and spaces.

I hope this will help someone in the future

answered Dec 06 '22 01:12

Tilman Böckenförde

Related questions
                            
                                How to write to Kafka from Python logging module?
                            
                                Matplotlib: Vertical lines in scatter plot
                            
                                How do I check if all elements in a list are the same?
                            
                                python - remove all decimals from a float
                            
                                Python: function takes 1 positional argument but 2 were given, how?
                            
                                Why is random() * random() different to random() ** 2?
                            
                                Generic way to get primary key from declaratively defined instance in SQLAlchemy
                            
                                Empty values in a Python list
                            
                                Check if a string contains only given characters
                            
                                Set background of Python OpenCV warpPerspective
                            
                                PyPlot - Setting grid line spacing for plot
                            
                                Clear/overwrite standard output in Python
                            
                                login_user fails to get user id
                            
                                Check for existence of multiple columns
                            
                                Use groupby in Pandas to count things in one column in comparison to another
                            
                                Define fields programmatically in Marshmallow Schema
                            
                                Create transparent image in opencv python
                            
                                django.core.exceptions.ImproperlyConfigured: Creating a ModelForm without either the 'fields' attribute or the 'exclude' attribute is prohibited
                            
                                Django datetime format different from DRF serializer datetime format
                            
                                How to ignore SettingWithCopyWarning using warnings.simplefilter()?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With