Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

AWK: Derive fields (create new columns) based on existing columns

Tags:

awk

I have a | delimited file where each name in $1 is assigned to a group in $6. The file is sorted based on $5(ascending order).

name_1|2018-09-28|1801-01-01|22|2018-11-19|group2117
name_1|2018-11-28|2018-11-28|81|2018-11-28|group1179
name_1|2018-09-28|2018-12-18|22|2018-12-14|group2117
name_4|2019-10-09|1801-01-01|22|2019-10-14|group3090
name_4|2019-10-09|2019-10-18|22|2019-10-15|group3090
name_4|2019-10-20|1801-01-01|21|2019-10-20|group3147
name_4|2019-10-20|2019-10-22|21|2019-10-21|group3147
name_11|2020-05-05|1801-01-01|21|2020-05-08|group4457
name_11|2020-05-05|2020-05-18|21|2020-05-18|group4457

I am trying to add some extra columns to this file based on the existing columns.

For the first appearance of each group in $6, I would like to take the corresponding values from $2 and $4 and add it to $7 and $8, respectively. And for the last appearance of each group in $6, take the corresponding values from $3 and add it to $9. so the output looks like this

name_1|2018-09-28|1801-01-01|22|2018-11-19|group2117|2018-09-28|22|2018-12-18
name_1|2018-11-28|2018-11-28|81|2018-11-28|group1179|2018-11-28|81|2018-11-28
name_1|2018-09-28|2018-12-18|22|2018-12-14|group2117|
name_4|2019-10-09|1801-01-01|22|2019-10-14|group3090|2019-10-09|22|2019-10-18
name_4|2019-10-09|2019-10-18|22|2019-10-15|group3090|
name_4|2019-10-20|1801-01-01|21|2019-10-20|group3147|2019-10-20|21|2019-10-22
name_4|2019-10-20|2019-10-22|21|2019-10-21|group3147|
name_11|2020-05-05|1801-01-01|21|2020-05-08|group4457|2020-05-05|21|2020-05-18
name_11|2020-05-05|2020-05-18|21|2020-05-18|group4457|

For just one group in $6 I thought I can apply the below code but I am not sure how to adopt it for the outcome that I want

 awk -F"|" 'NR==1 {if($6==group1179); print $0,$7=$2,$8=$4,$9=$3}' OFS="|" file

output

name_1|2018-11-28|2018-11-28|81|2018-11-28|group1179|2018-11-28|81|2018-11-28

Please accept my apologies for the complicated sample input. Any lead on this will be highly appreciated.

like image 707
DSTO Avatar asked Dec 04 '22 17:12

DSTO


2 Answers

$ cat tst.sh
#!/usr/bin/env bash

sort -t'|' -k6,6 -k5,5 "${@:--}" |
awk '
    BEGIN { FS=OFS="|" }
    $6 != prev {
        if ( NR > 1 ) {
            prt()
        }
        prev = $6
    }
    { lines[++numLines] = $0 }
    END { prt() }

    function prt(       first,last,i) {
        split(lines[1],first)
        split(lines[numLines],last)

        print lines[1], first[2], first[4], last[3]

        for (i=2; i<=numLines; i++) {
            print lines[i]
        }

        numLines = 0
    }
' |
sort -t'|' -k5,5

$ ./tst.sh file
name_1|2018-09-28|1801-01-01|22|2018-11-19|group2117|2018-09-28|22|2018-12-18
name_1|2018-11-28|2018-11-28|81|2018-11-28|group1179|2018-11-28|81|2018-11-28
name_1|2018-09-28|2018-12-18|22|2018-12-14|group2117
name_4|2019-10-09|1801-01-01|22|2019-10-14|group3090|2019-10-09|22|2019-10-18
name_4|2019-10-09|2019-10-18|22|2019-10-15|group3090
name_4|2019-10-20|1801-01-01|21|2019-10-20|group3147|2019-10-20|21|2019-10-22
name_4|2019-10-20|2019-10-22|21|2019-10-21|group3147
name_11|2020-05-05|1801-01-01|21|2020-05-08|group4457|2020-05-05|21|2020-05-18
name_11|2020-05-05|2020-05-18|21|2020-05-18|group4457
like image 117
Ed Morton Avatar answered May 10 '23 22:05

Ed Morton


Another option is a two-pass approach where you capture the last instance of each group on the first pass and write the data out on the second, e.g.

awk -F"|" -v OFS="|" '
    NR==FNR { last[$6] = $3; next }
    $6 in seen { print; next }
    { print $0, $2, $4, last[$6]; seen[$6]++ }
' file file

Example Use/Output

name_1|2018-09-28|1801-01-01|22|2018-11-19|group2117|2018-09-28|22|2018-12-18
name_1|2018-11-28|2018-11-28|81|2018-11-28|group1179|2018-11-28|81|2018-11-28
name_1|2018-09-28|2018-12-18|22|2018-12-14|group2117
name_4|2019-10-09|1801-01-01|22|2019-10-14|group3090|2019-10-09|22|2019-10-18
name_4|2019-10-09|2019-10-18|22|2019-10-15|group3090
name_4|2019-10-20|1801-01-01|21|2019-10-20|group3147|2019-10-20|21|2019-10-22
name_4|2019-10-20|2019-10-22|21|2019-10-21|group3147
name_11|2020-05-05|1801-01-01|21|2020-05-08|group4457|2020-05-05|21|2020-05-18
name_11|2020-05-05|2020-05-18|21|2020-05-18|group4457
like image 44
David C. Rankin Avatar answered May 10 '23 23:05

David C. Rankin