I want to delete all the characters "\L" that I find when i read the file. I tried to use this function when I read a line:
def cleanString(self, s):
if isinstance(s, str):
s = unicode(s,"iso-8859-1","replace")
s=unicodedata.normalize('NFD', s)
return s.encode('ascii', 'ignore')
But it doesn't delete this character. Does someone know how to do it?
I tried using the replace
function as well, but it is not better:
s = line.replace("\^L","")
Thanks for your answers.
Probably you have not the literal characters ^
and L
, but something that is displayed as ^L
.
This would be the form feed character.
So do s = line.replace('\x0C', '')
.
^L
(codepoint 0C
) is an ASCII character, so it won't be affected by an encoding to ASCII. You could filter out all control characters using a small regex (and, while you're at it, filter out everything non-ASCII as well):
import re
def cleanString(self, s):
if isinstance(s, str):
s = unicode(s,"iso-8859-1","replace")
s = unicodedata.normalize('NFD', s)
s = re.sub(r"[^\x20-\x7f]+", "", s) # remove non-ASCII/nonprintables
return str(s) # No encoding necessary
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With