Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove ASCII control characters from text file Python

I have a text file from which I have to read a lot of numbers (double). It has ASCII control characters like DLE, NUL etc. which are visible in the text file. so when I read them to get only the doubles/ints from a line, I am getting erros like "invalid literals \x10". Shown below are the first 2 lines of my file. 

DLE NUL NUL NUL [1, 167, 133, 6]DLE NUL NUL   
YS FS NUL[0.0, 4.3025989e-07, 1.5446712e-06, 3.1393029e-06, 5.0430463e-06, 7.1382601e-06

How do I remove all these control characters from a text file at once, using Python? I want this to be done before I parse the file into numbers ...

Any help is appreciated!

like image 935
atmaere Avatar asked May 12 '26 14:05

atmaere


2 Answers

Use string.printable.

>>> import string
>>> filter(string.printable.__contains__, '\x00\x01XYZ\x00\x10')
'XYZ'
like image 163
falsetru Avatar answered May 14 '26 03:05

falsetru


I know it is very old post, but I am answering as I think, it could help others.

I did as follows. It will replace all ASCII control characters by an empty string.

line = re.sub(r'[\x00-\x1F]+', '', line)

Ref: ASCII (American Standard Code for Information Interchange) Code

Ref: Python re.sub()

like image 26
user1012513 Avatar answered May 14 '26 05:05

user1012513



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!