Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using awk to remove the Byte-order mark

How would an awk script (presumably a one-liner) for removing a BOM look like?

Specification:

  • print every line after the first (NR > 1)
  • for the first line: If it starts with #FE #FF or #FF #FE, remove those and print the rest
like image 687
Boldewyn Avatar asked Jul 01 '09 11:07

Boldewyn


People also ask

How do I remove byte order mark?

If you want to remove the byte order mark from a source code, you need a text editor that offers the option of saving the mark. You read the file with the BOM into the software, then save it again without the BOM and thereby convert the coding. The mark should then no longer appear.


1 Answers

Using GNU sed (on Linux or Cygwin):

# Removing BOM from all text files in current directory: sed -i '1 s/^\xef\xbb\xbf//' *.txt 

On FreeBSD:

sed -i .bak '1 s/^\xef\xbb\xbf//' *.txt 

Advantage of using GNU or FreeBSD sed: the -i parameter means "in place", and will update files without the need for redirections or weird tricks.

On Mac:

This awk solution in another answer works, but the sed command above does not work. At least on Mac (Sierra) sed documentation does not mention supporting hexadecimal escaping ala \xef.

A similar trick can be achieved with any program by piping to the sponge tool from moreutils:

awk '…' INFILE | sponge INFILE 
like image 183
Denilson Sá Maia Avatar answered Sep 22 '22 16:09

Denilson Sá Maia