What is the most efficient way in Python to convert a string to all lowercase stripping out all non-ascii alpha characters?

Tags:

string

I have a simple task I need to perform in Python, which is to convert a string to all lowercase and strip out all non-ascii non-alpha characters.

For example:

"This is a Test" -> "thisisatest" "A235th@#$&( er Ra{}|?>ndom" -> "atherrandom"

I have a simple function to do this:

import string import sys  def strip_string_to_lowercase(s):     tmpStr = s.lower().strip()     retStrList = []     for x in tmpStr:         if x in string.ascii_lowercase:             retStrList.append(x)      return ''.join(retStrList)

But I cannot help thinking there is a more efficient, or more elegant, way.

Thanks!

Edit:

Thanks to all those that answered. I learned, and in some cases re-learned, a good deal of python.

822

asked Mar 12 '09 14:03

1 Answers

Another solution (not that pythonic, but very fast) is to use string.translate - though note that this will not work for unicode. It's also worth noting that you can speed up Dana's code by moving the characters into a set (which looks up by hash, rather than performing a linear search each time). Here are the timings I get for various of the solutions given:

import string, re, timeit  # Precomputed values (for str_join_set and translate)  letter_set = frozenset(string.ascii_lowercase + string.ascii_uppercase) tab = string.maketrans(string.ascii_lowercase + string.ascii_uppercase,                        string.ascii_lowercase * 2) deletions = ''.join(ch for ch in map(chr,range(256)) if ch not in letter_set)  s="A235th@#$&( er Ra{}|?>ndom"  # From unwind's filter approach def test_filter(s):     return filter(lambda x: x in string.ascii_lowercase, s.lower())  # using set instead (and contains) def test_filter_set(s):     return filter(letter_set.__contains__, s).lower()  # Tomalak's solution def test_regex(s):     return re.sub('[^a-z]', '', s.lower())  # Dana's def test_str_join(s):     return ''.join(c for c in s.lower() if c in string.ascii_lowercase)  # Modified to use a set. def test_str_join_set(s):     return ''.join(c for c in s.lower() if c in letter_set)  # Translate approach. def test_translate(s):     return string.translate(s, tab, deletions)   for test in sorted(globals()):     if test.startswith("test_"):         assert globals()[test](s)=='atherrandom'         print "%30s : %s" % (test, timeit.Timer("f(s)",                "from __main__ import %s as f, s" % test).timeit(200000))

This gives me:

               test_filter : 2.57138351271            test_filter_set : 0.981806765698                 test_regex : 3.10069885233              test_str_join : 2.87172979743          test_str_join_set : 2.43197956381             test_translate : 0.335367566218

[Edit] Updated with filter solutions as well. (Note that using set.__contains__ makes a big difference here, as it avoids making an extra function call for the lambda.

193

answered Oct 12 '22 08:10

Brian

Related questions
                            
                                PIP: "Cannot uninstall 'ipython'. It is a distutils installed project and thus we cannot accurately determine..." [duplicate]
                            
                                Online compilers/runtime for Java, C++, Python and ObjC? [closed]
                            
                                How do I html-escape dangerous unsanitized input in jinja2?
                            
                                Custom distutils commands
                            
                                Python: Setting an element of a Numpy matrix
                            
                                Make dictionary from list with python [duplicate]
                            
                                Django - Create A Zip of Multiple Files and Make It Downloadable [duplicate]
                            
                                PostgreSQL ILIKE query with SQLAlchemy
                            
                                Python re.sub back reference not back referencing [duplicate]
                            
                                Multilabel-indicator is not supported for confusion matrix
                            
                                No module named 'pandas._libs.tslibs.timedeltas' in PyInstaller
                            
                                How to pass several list of arguments to @click.option
                            
                                Python: Find the min, max value in a list of tuples
                            
                                How do I display current time using Python + Django?
                            
                                plotting a histogram on a Log scale with Matplotlib
                            
                                ImportError: libtk8.6.so: cannot open shared object file: No such file or directory
                            
                                How do I zip the contents of a folder using python (version 2.5)?
                            
                                How to install PyQt5 on Windows?
                            
                                How to read a file with a semi colon separator in pandas
                            
                                Send automated messages to Microsoft Teams using Python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What is the most efficient way in Python to convert a string to all lowercase stripping out all non-ascii alpha characters?

Tags:

python

string

grieve

People also ask

1 Answers

Brian

Recent Activity

Donate For Us