Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Adding BOM (unicode signature) while saving file in python

Tags:

python

How can I add BOM (unicode signature) while saving file in python:

file_old = open('old.txt', mode='r', encoding='utf-8') file_new = open('new.txt', mode='w', encoding='utf-16-le') file_new.write(file_old.read()) 

I need to convert file to utf-16-le + BOM. Now script is working great, except that there is no BOM.

like image 528
Qiao Avatar asked Mar 05 '11 08:03

Qiao


People also ask

What is the UTF-8 BOM?

The UTF-8 BOM is a sequence of bytes at the start of a text stream ( 0xEF, 0xBB, 0xBF ) that allows the reader to more reliably guess a file as being encoded in UTF-8. Normally, the BOM is used to signal the endianness of an encoding, but since endianness is irrelevant to UTF-8, the BOM is unnecessary.

What is SIG utf8?

"sig" in "utf-8-sig" is the abbreviation of "signature" (i.e. signature utf-8 file). Using utf-8-sig to read a file will treat the BOM as metadata that explains how to interpret the file, instead of as part of the file contents.


2 Answers

Write it directly at the beginning of the file:

file_new.write('\ufeff') 
like image 170
Ocaso Protal Avatar answered Sep 17 '22 08:09

Ocaso Protal


It's better to use constants from 'codecs' module.

import codecs f.write(codecs.BOM_UTF16_LE) 
like image 38
kriomant Avatar answered Sep 20 '22 08:09

kriomant