Why is my Bash script adding <feff> to the beginning of files?

Tags:

I've written a script that cleans up .csv files, removing some bad commas and bad quotes (bad, means they break an in house program we use to transform these files) using sed:

# remove all commas, and re-insert the good commas using clean.sed
sed -f clean.sed $1 > $1.1st

# remove all quotes
sed 's/\"//g' $1.1st > $1.tmp

# add the good quotes around good commas
sed 's/\,/\"\,\"/g' $1.tmp > $1.tmp1

# add leading quotes
sed 's/^/\"/' $1.tmp1 > $1.tmp2

# add trailing quotes
sed 's/$/\"/' $1.tmp2 > $1.tmp3

# remove utf characters
sed 's/<feff>//' $1.tmp3 > $1.tmp4

# replace original file with new stripped version and delete .tmp files
cp -rf $1.tmp4 quotes_$1

Here is clean.sed:

s/\",\"/XXX/g;
:a
s/,//g
ta
s/XXX/\",\"/g;

Then it removes the temp files and viola we have a new file that starts with the word "quotes" that we can use for our other processes.

My question is:
Why do I have to make a sed statement to remove the feff tag in that temp file? The original file doesn't have it, but it always appears in the replacement. At first I thought cp was causing this but if I put in the sed statement to remove before the cp, it isn't there.

Maybe I'm just missing something...

711

asked Dec 29 '09 00:12

SDGuero

2 Answers

U+FEFF is the code point for a byte order mark. Your files most likely contain data saved in UTF-16 and the BOM has been corrupted by your 'cleaning process' which is most likely expecting ASCII. It's probably not a good idea to remove the BOM, but instead to fix your scripts to not corrupt it in the first place.

answered Oct 04 '22 22:10

Mark Byers

To get rid of these in GNU emacs:

Open Emacs
Do a find-file-literally to open the file
Edit off the leading three bytes
Save the file

There is also a way to convert files with DOS line termination convention to Unix line termination convention.

answered Oct 04 '22 20:10

stinkoid

Related questions
                            
                                check if file is open with lsof
                            
                                No backtrace from SIGABRT signal on ARM platform?
                            
                                What is lockstep sampling?
                            
                                Where to find packages names and versions for RedHat?
                            
                                POSIX API call to list all the pthreads running in a process
                            
                                How bash handles the jobs when logout?
                            
                                Force to link against unused shared library
                            
                                Docker Ignores limits.conf (trying to solve "too many open files" error)
                            
                                Where is ssize_t defined in Linux?
                            
                                Sockets On Same Machine For Windows and Linux
                            
                                Anticipate "kernel too old" errors between 2.6.16 and 2.6.26 kernel versions
                            
                                What is _GLOBAL_OFFSET_TABLE?
                            
                                When to use --dynamic option in nm
                            
                                Profiling sleep times with perf
                            
                                Getting a backtrace of other thread
                            
                                What does it mean in linux scripts? #!/usr/bin/python -tt
                            
                                Jenkins, xvfb and selenium
                            
                                How to deal with linker error : error-cannot find -lgcc
                            
                                What' the differences between `chattr +i FILE` and `chmod -w FILE`?
                            
                                Compiling Objective-C project on Linux (Ubuntu)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why is my Bash script adding <feff> to the beginning of files?

Tags:

linux

bash

sed

cp

SDGuero

People also ask

2 Answers

Mark Byers

stinkoid

Recent Activity

Donate For Us