Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to right pad a field with spaces using AWK

Tags:

awk

I have a file that I am attempting to strip customer names from using AWK. The file is a fixed width file, and every column has meaning.

The file consists of many lines, all the same format, very similar to the below:

1234-123   123456 12345678901234CUSTOMER NAME TO REMOVE12345-1234 TRN   123-123   12345678901-1234  TRN 12345678        
1234-123   123456 12345678901234CUSTOMER NAME TO REMOVE12345-1234 TRN   123-123   12345678901-1234  TRN 12345678        
1234-123   123456 12345678901234CUSTOMER NAME TO REMOVE12345-1234 TRN   123-123   12345678901-1234  TRN 12345678        
1234-123   123456 12345678901234CUSTOMER NAME TO REMOVE12345-1234 TRN   123-123   12345678901-1234  TRN 12345678

It is the customer name that I need to swap with an imaginary name so that the desired output is:

1234-123   123456 12345678901234SENTINAL PRIME         12345-1234 TRN   123-123   12345678901-1234  TRN 12345678        
1234-123   123456 12345678901234OPTIMUS PRIME          12345-1234 TRN   123-123   12345678901-1234  TRN 12345678        
1234-123   123456 12345678901234BUMBLE BEE             12345-1234 TRN   123-123   12345678901-1234  TRN 12345678        
1234-123   123456 12345678901234IRON HIDE              12345-1234 TRN   123-123   12345678901-1234  TRN 12345678

I have a list of transformer names that I would like to use for this, stored in a file called transformer.names.

SENTINEL PRIME
OPTIMUS PRIME
BUMBLEBEE
IRONHIDE

However, to keep each line of the original file the same width, I need to right pad the transformer names with spaces as the transformer names I have are all different lengths.

It seems to be possible to right pad these names to a certain length using AWK but I have not managed to figure it out (or find a clear enough answer) for me to understand yet.

Below is my current AWK script.

#!/usr/bin/awk -f
BEGIN {
}
{
  getline line < "transformer.names"
  print substr($0, 0, 30) line substr($0, 62, 120)
}

I run it with this command:

my_program.awk my-file.txt

I think I can include a line something like this in place of the print line above, however I have not managed to get it working yet.

printf "-%32s|", substr($0, 0, 30) line substr($0, 62, 120)

Any tips would be fantastic!

like image 874
John Deverall Avatar asked Jun 22 '18 02:06

John Deverall


2 Answers

You need to apply the %Ns to the specific field you want to pad not the whole line and you need to make the minus (for leftpad/rightalign) part of the specifier, and also printf does not automatically add a line/record separator as print does so you need to add that:

 printf "%s%-32s%s\n", substr($0, 1, 30), newname, substr($0, 62, 120)
 # note commas; this is a format string containing three specifiers, 
 # and separate three data values used for those three specifiers

Alternatively you could pad the field and then concatenate:

 print substr($0,1,30) sprintf("%-32s", newname) substr($0,62,120) 
 # no commas except within the sprintf (and the substr's) 

If your data file has more lines than your 'transformernames' file, then you need to buffer the names and cycle through them repeatedly, as Ravinder shows.

Also, substr positions in awk start at 1; if you specify 0 or negative, it is treated as 1 but I think it's clearer to actually say what you mean, so I fixed that. 62 is not the correct starting position for the part after the customer name in the example data you posted, but you said that data is only 'very similar' to the real data, so I don't know whether 56 or 62 or something else is correct.

like image 135
dave_thompson_085 Avatar answered Nov 15 '22 09:11

dave_thompson_085


Could you please try following and let me know if this helps you. So it will have all transformer names and let's say it has lesser values than Input_file lines then it will keep printing lines from starting of it.

awk '
FNR==NR{
  a[FNR]=$0;
  count=FNR;
  next}
{
  val=val==count?1:++val;
  print substr($0,1,32) a[val]"\t\t"substr($0,56)
}' transformer.names  Input_file

Explanation: Adding explanation for above code too now.

awk '
FNR==NR{                                          ##Checking condition here FNR==NR which will be TRUE when first Input_file is being read.
  a[FNR]=$0;                                      ##Creating an array named a whose index is FNR and value is current line.
  count=FNR;                                      ##Creating variable count whose value is FNR value(current line number value of first Input_file).
  next}                                           ##next will skip further statements from here onward.
{                                                 ##This block will execute when 2nd Input_file is being read.
  val=val==count?1:++val;                         ##Creating variable val whose value is increment each time and when it is equal to count it is set to 1 then.
  print substr($0,1,32) a[val]"\t\t"substr($0,56) ##Printing sub-string from 1 to 32 chars, value of a[val] TABs then sub-string from 56 char to till last of line.
}' transformer.names  Input_file                  ##Mentioning Input_file(s) name here.
like image 25
RavinderSingh13 Avatar answered Nov 15 '22 10:11

RavinderSingh13