Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

AWK: Maintain field spacing like input file

Tags:

bash

awk

perl

I am emulating my issue in below test file:

# cat out 
2014-01-10 18:23:25          0 Andy/ADPTER/
2014-01-10 18:23:36        503 Sandy/ADPTER/ACCOUNTTYPE MAP.csv
2014-01-10 18:23:38        516 John/ADPTER/CITY MAP.csv
2014-01-10 18:23:38        398 Wendy/ADPTER/COUNTRY MAP.csv
2014-01-10 18:23:38      11117 Andy/ADPTER/CURRENCY MAP.csv
2014-01-10 18:23:38        260 Sandy/ADPTER/GENDER MAP.csv
2014-01-10 18:23:39        466 John/ADPTER/STATE MAP.csv
2014-01-10 18:23:40        373 Jim/ADPTER/UNITS MAP.csv

This is my Bash variable:

# echo $bucket
bucket_name

So, in above file, I want the Bash variable value be prefixed to the 4th Field.

This is my desired output:

2014-01-10 18:23:25          0 bucket_name/Andy/ADPTER/
2014-01-10 18:23:36        503 bucket_name/Sandy/ADPTER/ACCOUNTTYPE MAP.csv
2014-01-10 18:23:38        516 bucket_name/John/ADPTER/CITY MAP.csv
2014-01-10 18:23:38        398 bucket_name/Wendy/ADPTER/COUNTRY MAP.csv
2014-01-10 18:23:38      11117 bucket_name/Andy/ADPTER/CURRENCY MAP.csv
2014-01-10 18:23:38        260 bucket_name/Sandy/ADPTER/GENDER MAP.csv
2014-01-10 18:23:39        466 bucket_name/John/ADPTER/STATE MAP.csv
2014-01-10 18:23:40        373 bucket_name/Jim/ADPTER/UNITS MAP.csv

This is what I have tried:

# awk -v var=$bucket '{$4=var"/"$4; print}' out 
2014-01-10 18:23:25 0 bucket_name/Andy/ADPTER/
2014-01-10 18:23:36 503 bucket_name/Sandy/ADPTER/ACCOUNTTYPE MAP.csv
2014-01-10 18:23:38 516 bucket_name/John/ADPTER/CITY MAP.csv
2014-01-10 18:23:38 398 bucket_name/Wendy/ADPTER/COUNTRY MAP.csv
2014-01-10 18:23:38 11117 bucket_name/Andy/ADPTER/CURRENCY MAP.csv
2014-01-10 18:23:38 260 bucket_name/Sandy/ADPTER/GENDER MAP.csv
2014-01-10 18:23:39 466 bucket_name/John/ADPTER/STATE MAP.csv
2014-01-10 18:23:40 373 bucket_name/Jim/ADPTER/UNITS MAP.csv

Question:

My awk command does what I need, however, it messes up the outfield spacing (separator ??). My Intention is to just prefix bucket_name/ to 4th field and maintain whatever spacing scheme (including right/left justified fields) the input file has.

This is my another attempt:

# awk -v var=$bucket 'BEGIN{OFS="\t"}{$4=var"/"$4; print}' out 
2014-01-10  18:23:25    0   bucket_name/Andy/ADPTER/
2014-01-10  18:23:36    503 bucket_name/Sandy/ADPTER/ACCOUNTTYPE    MAP.csv
2014-01-10  18:23:38    516 bucket_name/John/ADPTER/CITY    MAP.csv
2014-01-10  18:23:38    398 bucket_name/Wendy/ADPTER/COUNTRY    MAP.csv
2014-01-10  18:23:38    11117   bucket_name/Andy/ADPTER/CURRENCY    MAP.csv
2014-01-10  18:23:38    260 bucket_name/Sandy/ADPTER/GENDER MAP.csv
2014-01-10  18:23:39    466 bucket_name/John/ADPTER/STATE   MAP.csv
2014-01-10  18:23:40    373 bucket_name/Jim/ADPTER/UNITS    MAP.csv

But it's not helping either.

Thanks.

like image 238
slayedbylucifer Avatar asked Feb 21 '26 06:02

slayedbylucifer


2 Answers

You have tagged Perl in OP so there is a Perl solution:

perl -pe'BEGIN{$var=shift}s,(?:.*?\s+){3}\K,$var/,' "$bucket" out

It is technically same solution as the solution using sed but with the benefit it avoids escaping problems. Shell variable $bucket can contain anything.

like image 196
Hynek -Pichi- Vychodil Avatar answered Feb 22 '26 22:02

Hynek -Pichi- Vychodil


You can use this awk:

bucket="bucket_name"
awk --re-interval -v b="$bucket" '{sub(/([^[:blank:]]+[[:blank:]]+){3}/, 
     "&" b "/")} 1' file
2014-01-10 18:23:25          0 bucket_name/Andy/ADPTER/
2014-01-10 18:23:36        503 bucket_name/Sandy/ADPTER/ACCOUNTTYPE MAP.csv
2014-01-10 18:23:38        516 bucket_name/John/ADPTER/CITY MAP.csv
2014-01-10 18:23:38        398 bucket_name/Wendy/ADPTER/COUNTRY MAP.csv
2014-01-10 18:23:38      11117 bucket_name/Andy/ADPTER/CURRENCY MAP.csv
2014-01-10 18:23:38        260 bucket_name/Sandy/ADPTER/GENDER MAP.csv
2014-01-10 18:23:39        466 bucket_name/John/ADPTER/STATE MAP.csv
2014-01-10 18:23:40        373 bucket_name/Jim/ADPTER/UNITS MAP.csv

Online Working Demo

-v b="$bucket"                 # pass a value to awk in variable b
--re-interval                  # Enable the use of interval
                               # expressions in regular expression matching
sub                            # match input using regex and substitute with
                               # the given string
([^[:blank:]]+[[:blank:]]+){3} # match first 3 fields of the line separated by space/tab
 "&" b "/"                     # replace by matched string + var b + /

EDIT: (Thanks to @EdMorton) To make it work with any value in argument (e.g. try both solutions if bucket="&") use:

awk --re-interval -v b="$bucket" 'match($0, /([^[:blank:]]+[[:blank:]]+){3}/) {
    $0 = substr($0, 1, RLENGTH) b "/" substr($0, RLENGTH+1) } 1' file
like image 31
anubhava Avatar answered Feb 22 '26 22:02

anubhava