I have following code to join multiple files together. It works fine but I want to replace the empty values to 0, so I used -e "0". But it doesn't work. Any ideas?
for k in `ls file?`
do
if [ -a final.results ]
then
join -a1 -a2 -e "0" final.results $k > tmp.res
mv tmp.res final.results
else
cp $k final.results
fi
done
example:
file1:
a 1
b 2
file2:
a 1
c 2
file3:
b 1
d 2
Results:
a 1 0 1 0
b 2 1 0
c 2
d 2
expected:
a 1 1 0
b 2 0 1
c 0 2 0
d 0 0 2
s/search/replace/g — this is the substitution command. The s stands for substitute (i.e. replace), the g instructs the command to replace all occurrences.
To join two or more text files on the Linux command-line, you can use the cat command. The cat (short for “concatenate”) command is one of the most commonly used commands in Linux as well as other UNIX-like operating systems, used to concatenate files and print on the standard output.
An aside, the GNU version of join supports -o auto
. The -e
and -o
cause enough frustration to turn people to learning awk. (See also How to get all fields in outer join with Unix join?). As cmh said: it's [not] documented, but when using join the -e
option only works in conjunction with the -o
option.
General solution:
cut -d ' ' -f1 file? | sort -u > tmp.index
for k in file?; do join -a1 -e '0' -o '2.2' tmp.index $k > tmp.file.$k; done
paste -d " " tmp.index tmp.file.* > final.results
rm tmp*
Bonus: how do I compare multiple branches in git?
for k in pmt atc rush; do git ls-tree -r $k | cut -c13- > ~/tmp-branch-$k; done
cut -f2 ~/tmp-branch-* | sort -u > ~/tmp-allfiles
for k in pmt atc rush; do join -a1 -e '0' -t$'\t' -11 -22 -o '2.2' ~/tmp-allfiles ~/tmp-branch-$k > ~/tmp-sha-$k; done
paste -d " " ~/tmp-allfiles ~/tmp-sha-* > final.results
egrep -v '(.{40}).\1.\1' final.results # these files are not the same everywhere
It's poorly documented, but when using join
the -e
option only works in conjunction with the -o
option. The order string needs to be amended each time around the loop. The following code should generate your desired output.
i=3
orderl='0,1.2'
orderr=',2.2'
for k in $(ls file?)
do
if [ -a final.results ]
then
join -a1 -a2 -e "0" -o "$orderl$orderr" final.results $k > tmp.res
orderl="$orderl,1.$i"
i=$((i+1))
mv tmp.res final.results
else
cp $k final.results
fi
done
As you can see, it starts to become messy. If you need to extend this much further it might be worth deferring to a beefier tool such as awk or python.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With