Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

bash join multiple files with empty replacement (-e option)

Tags:

linux

bash

join

I have following code to join multiple files together. It works fine but I want to replace the empty values to 0, so I used -e "0". But it doesn't work. Any ideas?

for k in `ls file?`
do
    if [ -a final.results ]
    then
            join -a1 -a2 -e "0" final.results $k  > tmp.res
            mv tmp.res final.results
    else
            cp $k final.results
    fi

done

example:

file1: 
a 1 
b 2
file2:
a 1 
c 2
file3:
b 1 
d 2

Results:
a 1 0 1 0
b 2 1 0
c 2
d 2

expected:
a 1 1 0
b 2 0 1
c 0 2 0
d 0 0 2
like image 960
Amir Avatar asked Dec 19 '12 23:12

Amir


People also ask

How replace multiple files in Linux?

s/search/replace/g — this is the substitution command. The s stands for substitute (i.e. replace), the g instructs the command to replace all occurrences.

How do I join files in Linux?

To join two or more text files on the Linux command-line, you can use the cat command. The cat (short for “concatenate”) command is one of the most commonly used commands in Linux as well as other UNIX-like operating systems, used to concatenate files and print on the standard output.


2 Answers

An aside, the GNU version of join supports -o auto. The -e and -o cause enough frustration to turn people to learning awk. (See also How to get all fields in outer join with Unix join?). As cmh said: it's [not] documented, but when using join the -e option only works in conjunction with the -o option.

General solution:

cut -d ' ' -f1 file? | sort -u > tmp.index
for k in file?; do join -a1 -e '0' -o '2.2' tmp.index $k > tmp.file.$k; done
paste -d " " tmp.index tmp.file.* > final.results
rm tmp*

Bonus: how do I compare multiple branches in git?

for k in pmt atc rush; do git ls-tree -r $k | cut -c13- > ~/tmp-branch-$k; done
cut -f2 ~/tmp-branch-* | sort -u > ~/tmp-allfiles
for k in pmt atc rush; do join -a1 -e '0' -t$'\t' -11 -22 -o '2.2' ~/tmp-allfiles ~/tmp-branch-$k > ~/tmp-sha-$k; done
paste -d " " ~/tmp-allfiles ~/tmp-sha-* > final.results
egrep -v '(.{40}).\1.\1' final.results # these files are not the same everywhere
like image 172
William Entriken Avatar answered Nov 08 '22 14:11

William Entriken


It's poorly documented, but when using join the -e option only works in conjunction with the -o option. The order string needs to be amended each time around the loop. The following code should generate your desired output.

i=3
orderl='0,1.2'
orderr=',2.2'
for k in $(ls file?)
do
    if [ -a final.results ]
    then
            join -a1 -a2 -e "0" -o "$orderl$orderr" final.results $k  > tmp.res
            orderl="$orderl,1.$i"
            i=$((i+1))
            mv tmp.res final.results
    else
            cp $k final.results
    fi
done

As you can see, it starts to become messy. If you need to extend this much further it might be worth deferring to a beefier tool such as awk or python.

like image 23
cmh Avatar answered Nov 08 '22 13:11

cmh