I am trying to sort lines between patterns in Bash or in Python. I would like to sort the lines based on the second field with "," as delimiter.
Given the following text input file:
Sample1
T1,64,0.65 MEDIUM
T2,60,0.45 LOW
T3,301,0.68 MEDIUM
T4,65,0.75 HIGH
T5,59,0.72 MEDIUM
T6,51,0.82 HIGH
Sample2
T1,153,0.77 HIGH
T2,152,0.61 MEDIUM
T3,154,0.67 MEDIUM
T4,283,0.66 MEDIUM
T5,161,0.65 MEDIUM
Sample3
T1,147,0.71 MEDIUM
T2,154,0.63 MEDIUM
T3,45,0.63 MEDIUM
T4,259,0.77 HIGH
I expect as output:
Sample1
T6,51,0.82 HIGH
T5,59,0.72 MEDIUM
T2,60,0.45 LOW
T1,64,0.65 MEDIUM
T4,65,0.75 HIGH
T3,301,0.68 MEDIUM
Sample2
T2,152,0.61 MEDIUM
T1,153,0.77 HIGH
T3,154,0.67 MEDIUM
T5,161,0.65 MEDIUM
T4,283,0.66 MEDIUM
Sample3
T3,45,0.63 MEDIUM
T1,147,0.71 MEDIUM
T2,154,0.63 MEDIUM
T4,259,0.77 HIGH
I have tried to adapt this suggestion by glenn jackman found in another post but it only works for 2 pattern as far as I tested:
> gawk -v cmd="sort -k2" p=1 '
> /^PATTERN2/ { # when we we see the 2nd marker:
> close("cmd", "to");
> while (("cmd" |& getline line) >0) print line
> p=1
> }
> p {print} # if p is true, print the line
> !p {print |& "cmd"} # if p is false, send the line to `sort`
> /^PATTERN1/ {p=0} # when we see the first marker, turn off printing ' FILE
You can do this with GNU awk in the following way:
$ awk 'BEGIN{PROCINFO["sorted_in"]="@val_num_asc"; FS=","}
/PATTERN/{
for(i in a) print i
delete a
print; next
}
{ a[$0]=$2 }
END{ for(i in a) print i }' file
With PROCINFO["sorted_in"]="@val_num_asc"
, we tell GNU awk to traverse the arrays in a way that the values of the array elements appear in numerical ascending order. The idea is to make an array with key the full line and value the second field. We don't use the second field as key as there might be duplicates. This could still be achieved however in the following way:
$ awk 'BEGIN{PROCINFO["sorted_in"]="@val_num_asc"; FS=","}
/PATTERN/{
for(i in a) print a[i]
delete a
print; next
}
($2 in a){ a[$2]=a[$2] ORS $0; next }
{ a[$2] = $0 }
END{ for(i in a) print a[i] }' file
Please see the function below.
def sort_lines_by_second_field(source_filename: str, destination_filename: str):
with open(source_filename) as source:
lines = source.readlines()
lines.sort(key=lambda row: int(row.split(',')[1]))
with open(destination_filename, "w") as destination:
destination.writelines(lines)
It reads all lines, sort them by second field which is cast to the integer at first and then saves them to the target file.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With