Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How awk the filename as a column in the output?

Tags:

grep

sed

awk

I am trying to perform some grep in contents of several files in a directory and appending my grep match in a single file, in my output I would also want a column which will have the filename as well to understand from which files that entry was picked up. I was trying to use awk for the same but it did not work.

for i in *_2.5kb.txt; do more $i | grep "NM_001080771" | echo `basename $i` | awk -F'[_.]' '{print $1"_"$2}' | head >> prom_genes_2.5kb.txt; done

files names are like this , I have around 50 files

    48hrs_CT_merged_peaks_2.5kb.txt
    48hrs_TAMO_merged_peaks_2.5kb.txt
    72hrs_TAMO_merged_peaks_2.5kb.txt
    72hrs_CT_merged_peaks_2.5kb.txt
    5D_CT_merged_peaks_2.5kb.txt
    5D_TAMO_merged_peaks_2.5kb.txt

each file contents several lines

chr1    3663275 3663483 14  2.55788 2.99631 1.40767 NM_001011874    -
chr1    4481687 4488063 264 7.85098 28.25170    26.41094    NM_011441   -
chr1    5008006 5013929 243 8.20677 26.17854    24.37907    NM_021374   -
chr1    5578362 5579949 65  3.48568 7.83501 6.57570 NM_011011   +
chr1    5905702 5908002 148 5.84647 16.53171    14.88463    NM_010342   -
chr1    9288507 9290352 77  4.04459 9.12442 7.77642 NM_027671   -
chr1    9291742 9292528 142 5.74749 16.21792    14.28185    NM_027671   -
chr1    9535689 9536176 72  4.45286 8.82567 7.29563 NM_021511   +
chr1    9535689 9536176 72  4.45286 8.82567 7.29563 NM_175236   +
chr1    9535689 9536176 72  4.45286 8.82567 7.29563 NR_027664   +

When I am getting a match for "NM_001080771" I am printing the entire content of that line to a new file and for each file this operation is being done and appending the match to one output file. I also want to add a column with filename as shown above in the final output so that I know from which file I am getting the entries.

desired output

chr4    21610972    21618492    193 7.28409 21.01724    19.35525    NM_001080771    -   48hrs_CT
chr4    21605096    21618696    76  4.22442 9.32981 7.68131 NM_001080771    -   48hrs_TAMO
chr4    21604864    21618713    12  1.78194 2.36793 1.25883 NM_001080771    -   72hrs_CT
chr4    21610305    21615717    26  2.90579 4.47333 2.65353 NM_001080771    -   72hrs_TAMO
chr4    21609924    21618600    23  2.63778 4.0642  2.33685 NM_001080771    -   5D_CT
chr4    21609936    21618680    30  5.63778 3.0642  8.33685 NM_001080771    -   5D_TAMO

This is not working. I want to basically append a column where the filename should also get added as an entry either first or the last column. How to do that?

like image 233
ivivek_ngs Avatar asked Nov 01 '25 13:11

ivivek_ngs


1 Answers

or you can do all in awk

 awk '/NM_001080771/ {print $0, FILENAME}' *_2.5kb.txt

this trims the filename in the desired format

$ awk '/NM_001080771/{sub(/_merged_peaks_2.5kb.txt/,"",FILENAME); 
                      print $0, FILENAME}' *_2.5kb.txt
like image 119
karakfa Avatar answered Nov 03 '25 08:11

karakfa