Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Delete ^L character in a log file [duplicate]

Tags:

python

unicode

I want to delete all the characters "\L" that I find when i read the file. I tried to use this function when I read a line:

def cleanString(self, s):
            if isinstance(s, str):
                    s = unicode(s,"iso-8859-1","replace")
                    s=unicodedata.normalize('NFD', s)
                    return s.encode('ascii', 'ignore')

But it doesn't delete this character. Does someone know how to do it?

I tried using the replace function as well, but it is not better:

s = line.replace("\^L","")

Thanks for your answers.

like image 744
Kvasir Avatar asked Jun 18 '14 14:06

Kvasir


2 Answers

Probably you have not the literal characters ^ and L, but something that is displayed as ^L.

This would be the form feed character.

So do s = line.replace('\x0C', '').

like image 60
glglgl Avatar answered Nov 04 '22 19:11

glglgl


^L (codepoint 0C) is an ASCII character, so it won't be affected by an encoding to ASCII. You could filter out all control characters using a small regex (and, while you're at it, filter out everything non-ASCII as well):

import re
def cleanString(self, s):
    if isinstance(s, str):
        s = unicode(s,"iso-8859-1","replace")
        s = unicodedata.normalize('NFD', s)
        s = re.sub(r"[^\x20-\x7f]+", "", s)  # remove non-ASCII/nonprintables
        return str(s)                        # No encoding necessary
like image 31
Tim Pietzcker Avatar answered Nov 04 '22 20:11

Tim Pietzcker