For debugging purposes, I need to recursively search a directory for all files which start with a UTF-8 byte order mark (BOM). My current solution is a simple shell script:
find -type f | while read file do if [ "`head -c 3 -- "$file"`" == $'\xef\xbb\xbf' ] then echo "found BOM in: $file" fi done
Or, if you prefer short, unreadable one-liners:
find -type f|while read file;do [ "`head -c3 -- "$file"`" == $'\xef\xbb\xbf' ] && echo "found BOM in: $file";done
It doesn't work with filenames that contain a line break, but such files are not to be expected anyway.
Is there any shorter or more elegant solution?
Are there any interesting text editors or macros for text editors?
To check if BOM character exists, open the file in Notepad++ and look at the bottom right corner. If it says UTF-8-BOM then the file contains BOM character.
The Unicode Standard permits the BOM in UTF-8, but does not require or recommend its use. Byte order has no meaning in UTF-8, so its only use in UTF-8 is to signal at the start that the text stream is encoded in UTF-8, or that it was converted to UTF-8 from a stream that contained an optional BOM.
The UTF-8 BOM is a sequence of bytes at the start of a text stream ( 0xEF, 0xBB, 0xBF ) that allows the reader to more reliably guess a file as being encoded in UTF-8. Normally, the BOM is used to signal the endianness of an encoding, but since endianness is irrelevant to UTF-8, the BOM is unnecessary.
To Add BOM to a UTF-8 file, we can directly write Unicode \ufeff or three bytes 0xEF , 0xBB , 0xBF at the beginning of the UTF-8 file. The Unicode \ufeff represents 0xEF , 0xBB , 0xBF , read this.
What about this one simple command which not just finds but clears the nasty BOM? :)
find . -type f -exec sed '1s/^\xEF\xBB\xBF//' -i {} \;
I love "find" :)
Warning The above will modify binary files which contain those three characters.
If you want just to show BOM files, use this one:
grep -rl $'\xEF\xBB\xBF' .
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With