Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why binary mode when reading/writing TOML in Python?

Tags:

python

utf-8

toml

When reading a toml file in normal read ("r") mode, I get an error

import tomli

with open("path_to_file/conf.toml", "r") as f: # have to use "rb" !
    toml_dict = tomli.load(f)

TypeError: File must be opened in binary mode, e.g. use open('foo.toml', 'rb')

Same happens when writing a toml file. Why?

tomli github readme says

The file must be opened in binary mode (with the "rb" flag). Binary mode will enforce decoding the file as UTF-8 with universal newlines disabled, both of which are required to correctly parse TOML.

I thought the age of typewriters was over, so why is the "universal newline" not allowed? toml spec says "Newline means LF (0x0A) or CRLF (0x0D 0x0A)" (poor Mac users) - that also doesn't clarify the reason to me... so, what am I missing?

like image 896
FObersteiner Avatar asked Oct 21 '25 13:10

FObersteiner


1 Answers

To wrap this up, the problem/behavior described in the question is actually a specific case of a more general problem: how to enforce a specific decoding when reading a text file with Python's open built-in. Or rephrased: ensure the file has a specific encoding.

tomli requires the user to handle the file IO, so the user could also use an arbitrary encoding in open(path-to-file, "r", encoding=...). However, the toml specification requires the input to be UTF-8. tomli implements this requirement by forcing the user to use binary mode "b" when reading the file, then does the decoding based on the read bytes (src).

like image 168
FObersteiner Avatar answered Oct 24 '25 05:10

FObersteiner