Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

transpose column and rows using gawk

I am trying to transpose a really long file and I am concerned that it will not be transposed entirely.

My data looks something like this:

Thisisalongstring12345678   1   AB  abc 937 4.320194
Thisisalongstring12345678   1   AB  efg 549 0.767828
Thisisalongstring12345678   1   AB  hi  346 -4.903441
Thisisalongstring12345678   1   AB  jk  193 7.317946

I want my data to look like this:

Thisisalongstring12345678 Thisisalongstring12345678 Thisisalongstring12345678 Thisisalongstring12345678
1                         1                         1                         1
AB                        AB                        AB                        AB
abc                       efg                       hi                        jk
937                       549                       346                       193
4.320194                  0.767828                  -4.903441                 7.317946

Would the length of the first string prove to be an issue? My file is much longer than this approx 2000 lines long. Also is it possible to change the name of the first string to Thisis234, and then transpose?

like image 200
user1269741 Avatar asked Apr 04 '12 00:04

user1269741


People also ask

How do I transpose columns to rows in Linux?

Select the sheet and copy it. Open another sheet and right-click on cell A1, go to paste options and select transpose. Now all your columns have become rows and all your rows are columns.

How do I use NF in awk?

NF is a predefined variable whose value is the number of fields in the current record. awk automatically updates the value of NF each time it reads a record. In your first program, you execute {NF=3} after each line is read, overwriting NF .

What is TR in awk?

The system tr utility transliterates characters. For example, it is often used to map uppercase letters into lowercase for further processing: generate data | tr 'A-Z' 'a-z' | process data …

How do I print a column to a row in Linux?

The `awk` command is one of many commands that can be used to print a range of columns from tabular data in Linux. The `awk` command is can be used directly from the terminal by executing the `awk` script file.


3 Answers

I don't see why it will not be - unless you don't have enough memory. Try the below and see if you run into problems.

Input:

$ cat inf.txt 
a b c d
1 2 3 4
. , + -
A B C D

Awk program:

$ cat mkt.sh
awk '
{
  for(c = 1; c <= NF; c++) {
    a[c, NR] = $c
  }
  if(max_nf < NF) {
    max_nf = NF
  }
}
END {
  for(r = 1; r <= NR; r++) {
    for(c = 1; c <= max_nf; c++) {
      printf("%s ", a[r, c])
    }
    print ""
  }
}
' inf.txt

Run:

$ ./mkt.sh 
a 1 . A 
b 2 , B 
c 3 + C 
d 4 - D 

Credits:

  • http://www.chemie.fu-berlin.de/chemnet/use/info/gawk/gawk_12.html#SEC121

Hope this helps.

like image 149
icyrock.com Avatar answered Oct 14 '22 16:10

icyrock.com


This can be done with the rs BSD command:

http://www.unix.com/man-page/freebsd/1/rs/

Check out the -T option.

like image 24
Kaz Avatar answered Oct 14 '22 14:10

Kaz


I tried icyrock.com's answer, but found that I had to change:

for(r = 1; r <= NR; r++) {
  for(c = 1; c <= max_nf; c++) {

to

for(r = 1; r <= max_nf; r++) {
  for(c = 1; c <= NR; c++) {

to get the NR columns and max_nf rows. So icyrock's code becomes:

$ cat mkt.sh
awk '
{
  for(c = 1; c <= NF; c++) {
    a[c, NR] = $c
  }
  if(max_nf < NF) {
    max_nf = NF
  }
}
END {
  for(r = 1; r <= max_nf; r++) {
    for(c = 1; c <= NR; c++) {
      printf("%s ", a[r, c])
    }
    print ""
  }
}
' inf.txt

If you don't do that and use an asymmetrical input, like:

a b c d
1 2 3 4
. , + -

You get:

a 1 .
b 2 ,
c 3 +

i.e. still 3 rows and 4 columns (the last of which is blank).

like image 4
ScubaFish Avatar answered Oct 14 '22 14:10

ScubaFish