Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I re-add a unicode byte order marker in linux?

I have a rather large SQL file which starts with the byte order marker of FFFE. I have split this file using the unicode aware linux split tool into 100,000 line chunks. But when passing these back to windows, it does not like any of the parts other than the first one as only it has the FFFE byte order marker on.

How can I add this two byte code using echo (or any other bash command)?

like image 361
Neil Trodden Avatar asked Jun 25 '09 15:06

Neil Trodden


People also ask

How do I add a BOM file?

To Add BOM to a UTF-8 file, we can directly write Unicode \ufeff or three bytes 0xEF , 0xBB , 0xBF at the beginning of the UTF-8 file. The Unicode \ufeff represents 0xEF , 0xBB , 0xBF , read this. 1.1 The below example, write a BOM to a UTF-8 file /home/mkyong/file. txt .

Does UTF-8 byte have an order mark?

UTF-8 has the same byte order regardless of platform endianness, so a byte order mark isn't needed. However, it may occur (as the byte sequence EF BB FF ) in data that was converted to UTF-8 from UTF-16, or as a "signature" to indicate that the data is UTF-8.


2 Answers

Based on sed's solution of Anonymous, sed -i '1s/^/\xef\xbb\xbf/' foo adds the BOM to the UTF-8 encoded file foo. Usefull is that it also converts ASCII files to UTF8 with BOM

like image 130
brillout Avatar answered Sep 29 '22 11:09

brillout


For a general-purpose solution—something that sets the correct byte-order mark regardless of whether the file is UTF-8, UTF-16, or UTF-32—I would use vim’s 'bomb' option:

$ echo 'hello' > foo
$ xxd < foo
0000000: 6865 6c6c 6f0a                           hello.
$ vim -e -s -c ':set bomb' -c ':wq' foo
$ xxd < foo
0000000: efbb bf68 656c 6c6f 0a                   ...hello.

(-e means runs in ex mode instead of visual mode; -s means don’t print status messages; -c means “do this”)

like image 37
andrewdotn Avatar answered Sep 29 '22 10:09

andrewdotn