I have a directory with 100 files of the same format:
> S43.txt
Gene S43-A1 S43-A10 S43-A11 S43-A12
DDX11L1 0 0 0 0
WASH7P 0 0 0 0
C1orf86 0 15 0 1
> S44.txt
Gene S44-A1 S44-A10 S44-A11 S44-A12
DDX11L1 0 0 0 0
WASH7P 0 0 0 0
C1orf86 0 15 0 1
I want to make a giant table containing all the columns from all the files, however when I do this:
paste S88.txt S89.txt | column -d '\t' >test.merge
Naturally, the file contains two 'Gene'
columns.
How can I paste ALL the files in the directory at once?
How can I exclude the first column from all the files after the first one?
Thank you.
If you're using bash
, you can use process substitution in paste
:
paste S43.txt <(cut -d ' ' -f2- S44.txt) | column -t
Gene S43-A1 S43-A10 S43-A11 S43-A12 S44-A1 S44-A10 S44-A11 S44-A12
DDX11L1 0 0 0 0 0 0 0 0
WASH7P 0 0 0 0 0 0 0 0
C1orf86 0 15 0 1 0 15 0 1
(cut -d$'\t' -f2- S44.txt)
will read all but first column in S44.txt
file.
To do this for all the file matching S*.txt
, use this snippet:
arr=(S*txt)
file="${arr[1]}"
for f in "${arr[@]:1}"; do
paste "$file" <(cut -d$'\t' -f2- "$f") > _file.tmp && mv _file.tmp file.tmp
file=file.tmp
done
# Clean up final output:
column -t file.tmp
use join
with the --nocheck-order option:
join --nocheck-order S43.txt S44.txt | column -t
(the column -t
command to make it pretty)
However, as you say you want to join all the files, and join only takes 2 at a time, you should be able to do this (assuming your shell is bash):
tmp=$(mktemp)
files=(*.txt)
cp "${files[0]}" result.file
for file in "${files[@]:1}"; do
join --nocheck-order result.file "$file" | column -t > "$tmp" && mv "$tmp" result.file
done
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With