Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Deleting specific control characters(\n \r \t) from a string

Tags:

python

string

I have quite large amount of text which include control charachters like \n \t and \r. I need to replace them with a simple space--> " ". What is the fastest way to do this? Thanks

like image 802
Hossein Avatar asked Feb 10 '11 09:02

Hossein


People also ask

How do you delete a certain character in a string?

Python Remove Character from String using replace() We can use string replace() function to replace a character with a new character. If we provide an empty string as the second argument, then the character will get removed from the string.

How do I remove a control character from a string in Python?

Depending on your preferences, you'd obtain the Python one-liner ''. join(c for c in s if unicodedata. category(c)[0] != 'C') removes all control characters in the original string s .

What is the use of \t in a string?

\t means tab, if you want to explicitely have a \ character, you'll need to escape it in your string: Or use a raw string: string = r" Hello \t World."


1 Answers

I think the fastest way is to use str.translate():

import string
s = "a\nb\rc\td"
print s.translate(string.maketrans("\n\t\r", "   "))

prints

a b c d

EDIT: As this once again turned into a discussion about performance, here some numbers. For long strings, translate() is way faster than using regular expressions:

s = "a\nb\rc\td " * 1250000

regex = re.compile(r'[\n\r\t]')
%timeit t = regex.sub(" ", s)
# 1 loops, best of 3: 1.19 s per loop

table = string.maketrans("\n\t\r", "   ")
%timeit s.translate(table)
# 10 loops, best of 3: 29.3 ms per loop

That's about a factor 40.

like image 70
Sven Marnach Avatar answered Oct 19 '22 04:10

Sven Marnach