Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to remove extended ascii using python?

In trying to fix up a PML (Palm Markup Language) file, it appears as if my test file has non-ASCII characters which is causing MakeBook to complain. The solution would be to strip out all the non-ASCII chars in the PML.

So in attempting to fix this in python, I have

import unicodedata, fileinput

for line in fileinput.input():
    print unicodedata.normalize('NFKD', line).encode('ascii','ignore')

However, this results in an error that line must be "unicode, not str". Here's a file fragment.

\B1a\B \tintense, disordered and often destructive rage†.†.†.\t

Not quite sure how to properly pass line in to be processed at this point.

like image 321
Jauder Ho Avatar asked Nov 06 '09 05:11

Jauder Ho


1 Answers

Try print line.decode('iso-8859-1').encode('ascii', 'ignore') -- that should be much closer to what you want.

like image 123
Alex Martelli Avatar answered Sep 21 '22 07:09

Alex Martelli