Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get ant concat to ignore BOM's'?

I have an ant build that concatenates my javascript into one file and then compresses it. The problem is that Visual Studio's default encoding attaches a BOM to every file. How do I configure ant to strip out BOM's that would otherwise appear in the middle of the resulting concatenated file?

My googl'ing revealed this discussion which is the exact problem I'm having but doesn't provide a solution: http://marc.info/?l=ant-user&m=118598847927096

like image 506
Breck Fresen Avatar asked Apr 30 '10 06:04

Breck Fresen


1 Answers

The Unicode byte order mark codepoint is U+FEFF. This concatenation command will strip out all BOM characters when concatenating two files:

<concat encoding="UTF-8" outputencoding="UTF-8" destfile="nobom-concat.txt">
  <filelist dir="." files="bom1.txt,bom2.txt" />
  <filterchain>
    <deletecharacters chars="&#xFEFF;" />
  </filterchain>
</concat>

This form of the concat command tells the task to decode the files as UTF-8 character data. I'm assuming UTF-8 as this is usually where Java/BOM issues occur.

In UTF-8, the BOM is encoded as the bytes EF BB BF. If you needed it to appear at the start of the resultant file, you could use a subsequent concatenation to prefix the output file with a BOM again.

Encoded values for U+FEFF in other UTF encodings are listed here.

like image 83
McDowell Avatar answered Sep 18 '22 17:09

McDowell