Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sort text columns by number of lines in bash

Assume a text file that contains x number of string columns.

$cat file # where x=3
foo  foo  foo
bar  bar  bar
     baz  baz
     qux

Is there a way in bash to sort these columns by the number text strings (i.e., filled rows) they contain, while maintaining the internal order of rows in each column?

$sought_command file
foo  foo  foo
bar  bar  bar
baz  baz
qux

Essentially, the column with the most number of rows is to be first, the column with the second-most number of rows is to be second, etc.

(This task would be easy to implement via R, but I am wondering about a solution via bash.)

EDIT 1:

Here are some additional details: Every column contains at least one text string (i.e., one filled row). The text strings may constitute any alphanumeric combination and have any length (but obviously do not contain spaces). The output columns must not have blank rows inserted. There is no a priori limitation on the column delimiter, as long as it remains consistent across the table.

All that is needed for this task is to shift the columns around as-is such that they are sorted by column length. (I know that implementing this in bash sounds easier than it actually is.)

like image 258
Michael G Avatar asked Nov 28 '16 18:11

Michael G


1 Answers

With GNU awk for sorted_in and assuming your columns are tab-separated:

$ cat tst.awk
BEGIN{ FS=OFS="\t" }
{
    for (i=1; i<=NF; i++) {
        if ($i ~ /[^[:space:]]/) {
            cell[NR,i] = $i
            cnt[i]++
        }
    }
    next
}
END {
    PROCINFO["sorted_in"] = "@val_num_desc"
    for (row=1; row<=NR; row++) {
        c=0
        for (col in cnt) {
            printf "%s%s", (c++?OFS:""), cell[row,col]
        }
        print ""
    }
}

$ awk -f tst.awk file
foo     foo     foo
bar     bar     bar
baz     baz
qux
like image 138
Ed Morton Avatar answered Oct 13 '22 13:10

Ed Morton