Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reformatting text file using awk and cut as a one liner

Tags:

text

bash

awk

cut

Data:

CHR SNP BP A1 TEST NMISS BETA SE L95 U95 STAT P 
1   chr1:1243:A:T 1243 T ADD 16283 -6.124 0.543 -1.431 0.3534 -1.123 0.14

Desired output:

MarkerName P-Value 
  chr1:1243  0.14

The actual file is 1.2G worth of lines like the above

I need to strip the 2nd column of the text past the 2nd colon and then paste this to the final 12th column and give it a new header.

I have tried:

awk '{print $2, $12}' | cut -d: -f1-2

but this removes the whole line after the colons and I want to keep the "p" column

I outputted this to a new file and then pasted it onto the P-value column using awk but was wondering if there was a one-liner method of doing this?

Many thanks

like image 736
tacrolimus Avatar asked Nov 13 '20 12:11

tacrolimus


4 Answers

My comment in more understandable form:

$ awk '
BEGIN {
    print "MarkerName P-Value"          # output header
}
NR>1 {                                  # skip the funky first record
    split($2,a,/:/)                     # split by :
    printf "%s:%s %s\n",a[1],a[2],$12   # printf allows easier output formating
}' file

Output:

MarkerName P-Value
chr1:1243 0.14
like image 74
James Brown Avatar answered Nov 03 '22 17:11

James Brown


EDIT: Adding one more solution here, since OP mentioned my first solution somehow didn't work for OP but it worked fine for me, as an alternative adding this here.

awk '
BEGIN{
  print "MarkerName P-Value"
}
FNR>1{
  match($2,/([^:]*:){2}/)
  print OFS substr($2,RSTART,RLENGTH-1),$NF
}
' Input_file


With shown samples, could you please try following. You need not to use cut with awk, awk could take care of everything within itself.

awk -F' +|:' '
BEGIN{
  print "MarkerName P-Value"
}
FNR>1{
  print OFS $2":"$3,$NF
}
' Input_file

Explanation: Adding detailed explanation for above.

awk -F' +|:' '                 ##Starting awk program from here and setting field separator as spaces or colon for all lines.
BEGIN{                         ##Starting BEGIN section of this program from here.
  print "MarkerName P-Value"   ##Printing headers here.
}
FNR>1{                         ##Checking condition if line number is greater than 1 then do following.
  print OFS $2":"$3,$NF        ##Printing space(OFS) 2nd field colon 3rd field and last field as per OP request.
}
' Input_file                   ##Mentioning Input_file name here.
like image 44
RavinderSingh13 Avatar answered Nov 03 '22 19:11

RavinderSingh13


$ awk -F'[: ]+' '{print (NR==1 ? "MarkerName P-Value" : $2":"$3" "$NF)}' file
MarkerName P-Value
chr1:1243 0.14
like image 20
Ed Morton Avatar answered Nov 03 '22 17:11

Ed Morton


Sed alternative:

sed -En '1{s/^.*$/MarkerName\tP-Value/p};s/([[:digit:]]+[[:space:]]+)([[:alnum:]]+:[[:digit:]]+)(.*)([[:digit:]]+\.[[:digit:]]+$)/\2\t\4/p'

For the first line, substitute the full line for the headers. Then, split the line into 4 sections based on regular expressions and then print the 2nd subsection followed by a tab and then the 4th subsection.

like image 41
Raman Sailopal Avatar answered Nov 03 '22 17:11

Raman Sailopal