Assume a text file that contains x number of string columns.
$cat file # where x=3
foo foo foo
bar bar bar
baz baz
qux
Is there a way in bash to sort these columns by the number text strings (i.e., filled rows) they contain, while maintaining the internal order of rows in each column?
$sought_command file
foo foo foo
bar bar bar
baz baz
qux
Essentially, the column with the most number of rows is to be first, the column with the second-most number of rows is to be second, etc.
(This task would be easy to implement via R
, but I am wondering about a solution via bash.)
EDIT 1:
Here are some additional details: Every column contains at least one text string (i.e., one filled row). The text strings may constitute any alphanumeric combination and have any length (but obviously do not contain spaces). The output columns must not have blank rows inserted. There is no a priori limitation on the column delimiter, as long as it remains consistent across the table.
All that is needed for this task is to shift the columns around as-is such that they are sorted by column length. (I know that implementing this in bash sounds easier than it actually is.)
With GNU awk for sorted_in and assuming your columns are tab-separated:
$ cat tst.awk
BEGIN{ FS=OFS="\t" }
{
for (i=1; i<=NF; i++) {
if ($i ~ /[^[:space:]]/) {
cell[NR,i] = $i
cnt[i]++
}
}
next
}
END {
PROCINFO["sorted_in"] = "@val_num_desc"
for (row=1; row<=NR; row++) {
c=0
for (col in cnt) {
printf "%s%s", (c++?OFS:""), cell[row,col]
}
print ""
}
}
$ awk -f tst.awk file
foo foo foo
bar bar bar
baz baz
qux
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With