Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python - efficient method to remove all non-letters and replace them with underscores

def format_title(title):  
  ''.join(map(lambda x: x if (x.isupper() or x.islower()) else '_', title.strip()))

Anything faster?

like image 338
christangrant Avatar asked Jan 31 '10 08:01

christangrant


1 Answers

The faster way to do it is to use str.translate() This is ~50 times faster than your way

# You only need to do this once
>>> title_trans=''.join(chr(c) if chr(c).isupper() or chr(c).islower() else '_' for c in range(256))

>>> "abcde1234!@%^".translate(title_trans)
'abcde________'

# Using map+lambda
$ python -m timeit '"".join(map(lambda x: x if (x.isupper() or x.islower()) else "_", "abcd1234!@#$".strip()))'
10000 loops, best of 3: 21.9 usec per loop

# Using str.translate
$ python -m timeit -s 'titletrans="".join(chr(c) if chr(c).isupper() or chr(c).islower() else "_" for c in range(256))' '"abcd1234!@#$".translate(titletrans)'
1000000 loops, best of 3: 0.422 usec per loop

# Here is regex for a comparison
$ python -m timeit -s 'import re;transre=re.compile("[\W\d]+")' 'transre.sub("_","abcd1234!@#$")'
100000 loops, best of 3: 3.17 usec per loop

Here is a version for unicode

# coding: UTF-8

def format_title_unicode_translate(title):
    return title.translate(title_unicode_trans)

class TitleUnicodeTranslate(dict):
    def __missing__(self,item):
        uni = unichr(item)
        res = u"_"
        if uni.isupper() or uni.islower():
            res = uni
        self[item] = res
        return res
title_unicode_trans=TitleUnicodeTranslate()

print format_title_unicode_translate(u"Metallica Μεταλλικα")

Note that the Greek letters count as upper and lower, so they are not substituted. If they are to be substituted, simply change the condition to

        if item<256 and (uni.isupper() or uni.islower()):
like image 170
John La Rooy Avatar answered Nov 10 '22 00:11

John La Rooy