I have a text file from which I have to read a lot of numbers (double). It has ASCII control characters like DLE, NUL etc. which are visible in the text file. so when I read them to get only the doubles/ints from a line, I am getting erros like "invalid literals \x10". Shown below are the first 2 lines of my file.
DLE NUL NUL NUL [1, 167, 133, 6]DLE NUL NUL
YS FS NUL[0.0, 4.3025989e-07, 1.5446712e-06, 3.1393029e-06, 5.0430463e-06, 7.1382601e-06
How do I remove all these control characters from a text file at once, using Python? I want this to be done before I parse the file into numbers ...
Any help is appreciated!
Use string.printable.
>>> import string
>>> filter(string.printable.__contains__, '\x00\x01XYZ\x00\x10')
'XYZ'
I know it is very old post, but I am answering as I think, it could help others.
I did as follows. It will replace all ASCII control characters by an empty string.
line = re.sub(r'[\x00-\x1F]+', '', line)
Ref: ASCII (American Standard Code for Information Interchange) Code
Ref: Python re.sub()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With