I have a Python 3 program that reads some strings from a Windows-1252 encoded file:
with open(file, 'r', encoding="cp1252") as file_with_strings:
# save some strings
Which I later want to write to stdout. I've tried to do:
print(some_string)
# => UnicodeEncodeError: 'ascii' codec can't encode character '\xe9' in position 180: ordinal not in range(128)
print(some_string.decode("utf-8"))
# => AttributeError: 'str' object has no attribute 'decode'
sys.stdout.buffer.write(some_str)
# => TypeError: 'str' does not support the buffer interface
print(some_string.encode("cp1252").decode("utf-8"))
# => UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 180: invalid continuation byte
print(some_string.encode("cp1252"))
# => has the unfortunate result of printing b'<my string>' instead of just the string
I'm scratching my head here. I'd like to print the string I got from the file just as it appears there, in cp1252. (In my terminal, when I do more $file
, these characters appear as question marks, so my terminal is probably ascii.)
Would love some clarification! Thanks!
Since Python 3.7, you can change the encoding of all text written to sys.stdout
with the reconfigure
method:
import sys
sys.stdout.reconfigure(encoding="cp1252")
That could be helpful if you need to change the encoding for all output from your program.
To anybody out there with the same problem, I ended up doing:
to_print = (some_string + "\n").encode("cp1252")
sys.stdout.buffer.write(to_print)
sys.stdout.flush() # I write a ton of these strings, and segfaulted without flushing
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With