Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Paste multiple files while excluding first column

Tags:

bash

shell

unix

I have a directory with 100 files of the same format:

> S43.txt

Gene    S43-A1   S43-A10  S43-A11  S43-A12
DDX11L1 0       0       0       0 
WASH7P  0       0       0       0
C1orf86 0       15      0       1 



> S44.txt

Gene    S44-A1   S44-A10  S44-A11  S44-A12
DDX11L1 0       0       0       0 
WASH7P  0       0       0       0
C1orf86 0       15      0       1 

I want to make a giant table containing all the columns from all the files, however when I do this:

paste S88.txt S89.txt | column -d '\t' >test.merge

Naturally, the file contains two 'Gene' columns.

  1. How can I paste ALL the files in the directory at once?

  2. How can I exclude the first column from all the files after the first one?

Thank you.

like image 517
gaelgarcia Avatar asked Feb 03 '16 19:02

gaelgarcia


2 Answers

If you're using bash, you can use process substitution in paste:

paste S43.txt <(cut -d ' ' -f2- S44.txt) | column -t
Gene     S43-A1  S43-A10  S43-A11  S43-A12  S44-A1  S44-A10  S44-A11  S44-A12
DDX11L1  0       0        0        0        0       0        0        0
WASH7P   0       0        0        0        0       0        0        0
C1orf86  0       15       0        1        0       15       0        1

(cut -d$'\t' -f2- S44.txt) will read all but first column in S44.txt file.

To do this for all the file matching S*.txt, use this snippet:

arr=(S*txt)
file="${arr[1]}"

for f in "${arr[@]:1}"; do
   paste "$file" <(cut -d$'\t' -f2- "$f") > _file.tmp && mv _file.tmp file.tmp
   file=file.tmp
done

# Clean up final output:
column -t file.tmp
like image 133
anubhava Avatar answered Oct 15 '22 22:10

anubhava


use join with the --nocheck-order option:

join --nocheck-order S43.txt S44.txt | column -t

(the column -t command to make it pretty)

However, as you say you want to join all the files, and join only takes 2 at a time, you should be able to do this (assuming your shell is bash):

tmp=$(mktemp)
files=(*.txt)

cp "${files[0]}" result.file
for file in "${files[@]:1}"; do
    join --nocheck-order result.file "$file" | column -t > "$tmp" && mv "$tmp" result.file
done
like image 36
glenn jackman Avatar answered Oct 15 '22 22:10

glenn jackman