I'm searching (without success) for a script, which would work as a batch file and allow me to prepend a UTF-8 text file with a BOM if it doesn't have one.
Neither the language it is written in (perl, python, c, bash) nor the OS it works on, matters to me. I have access to a wide range of computers.
I've found a lot of scripts to do the reverse (strip the BOM), which sounds to me as kind of silly, as many Windows program will have trouble reading UTF-8 text files if they don't have a BOM.
Did I miss the obvious?
Thanks!
The Unicode Standard permits the BOM in UTF-8, but does not require or recommend its use. Byte order has no meaning in UTF-8, so its only use in UTF-8 is to signal at the start that the text stream is encoded in UTF-8, or that it was converted to UTF-8 from a stream that contained an optional BOM.
Select “Save As” from File menu, go to Save button and open its dropdown menu, select “Save with Encoding…”, choose “Unicode (UTF-8 without signature)”.
The UTF-8 file signature (commonly also called a "BOM") identifies the encoding format rather than the byte order of the document. UTF-8 is a linear sequence of bytes and not sequence of 2-byte or 4-byte units where the byte order is important. Encoding. Encoded BOM. UTF-8.
The UTF-8 BOM is a sequence of bytes at the start of a text stream ( 0xEF, 0xBB, 0xBF ) that allows the reader to more reliably guess a file as being encoded in UTF-8. Normally, the BOM is used to signal the endianness of an encoding, but since endianness is irrelevant to UTF-8, the BOM is unnecessary.
I wrote this addbom.sh using the 'file' command and ICU's 'uconv' command.
#!/bin/sh if [ $# -eq 0 ] then echo usage $0 files ... exit 1 fi for file in "$@" do echo "# Processing: $file" 1>&2 if [ ! -f "$file" ] then echo Not a file: "$file" 1>&2 exit 1 fi TYPE=`file - < "$file" | cut -d: -f2` if echo "$TYPE" | grep -q '(with BOM)' then echo "# $file already has BOM, skipping." 1>&2 else ( mv "${file}" "${file}"~ && uconv -f utf-8 -t utf-8 --add-signature < "${file}~" > "${file}" ) || ( echo Error processing "$file" 1>&2 ; exit 1) fi done
edit: Added quotes around the mv
arguments. Thanks @DirkR and glad this script has been so helpful!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With