I have quite large amount of text which include control charachters like \n \t and \r. I need to replace them with a simple space--> " ". What is the fastest way to do this? Thanks
Python Remove Character from String using replace() We can use string replace() function to replace a character with a new character. If we provide an empty string as the second argument, then the character will get removed from the string.
Depending on your preferences, you'd obtain the Python one-liner ''. join(c for c in s if unicodedata. category(c)[0] != 'C') removes all control characters in the original string s .
\t means tab, if you want to explicitely have a \ character, you'll need to escape it in your string: Or use a raw string: string = r" Hello \t World."
I think the fastest way is to use str.translate()
:
import string
s = "a\nb\rc\td"
print s.translate(string.maketrans("\n\t\r", " "))
prints
a b c d
EDIT: As this once again turned into a discussion about performance, here some numbers. For long strings, translate()
is way faster than using regular expressions:
s = "a\nb\rc\td " * 1250000
regex = re.compile(r'[\n\r\t]')
%timeit t = regex.sub(" ", s)
# 1 loops, best of 3: 1.19 s per loop
table = string.maketrans("\n\t\r", " ")
%timeit s.translate(table)
# 10 loops, best of 3: 29.3 ms per loop
That's about a factor 40.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With