Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Replace a column in file with a column from a different file while retaining the format

Tags:

bash

sed

awk

I am stuck with an issue which might not seem too difficult to advanced shell users. Here is the problem.

I have 2 files:

File1 with a format like this:

ALPH      1  M   GIF M   1      11.111  23.123  -4.412  1.00  0.00           A
ALPH      2  BA  GIF M   1      22.222  78.251  -6.215  2.00  0.00           B
ALPH      3  C   GIF M   1      22.223  46.321  -6.124  3.00  0.00           C
ALPH      4  D   GIF M   1      23.333  15.214  -6.125  4.00  0.00           D
ALPH      5  AB  GIF M   1      24.111  61.458  -8.214  5.00  0.00           E
ALPH      6  LM  GIF M   1      25.333  78.214  -9.321  6.00  0.00           F
ALPH      7  BA  GIF M   1      17.645  87.256  -9.365  7.00  0.00           G
ALPH      8  BA2 GIF M   1      14.125  19.365  -1.258  8.00  0.00           H
-
-

File2 with a single column:

-0.14
-0.47
-0.58
-0.69
-0.25
-0.69
-0.12
-0.44

I want to replace the 11th column in File1 with the only column in File2. I do the following

paste File1 File2 | awk '{$11=$13;$13=""}1' > output

Although it replaces the column just fine, it messes up the original format of File1 which I would like to retain. As you can see that there are different number of spaces between all the fields of File1 and I would like to retain that even after replacing $11.

I have tried several approaches including column and printf but none seem to be working. Maybe I am doing something wrong.

Does anyone know how I can achieve the desired result preferably with awk or sed?

Thanks!

Rohit

like image 221
rohit Avatar asked Oct 28 '25 10:10

rohit


1 Answers

When you assign a value to a field in awk, it recompiles the current record using the current value of OFS to separate fields. To retain original spacing, then, you cannot assign a new value to a field. Instead you have to use an RE to describe how many non-space/spaces to skip before and after your assignment. Like this to replace the letter "c" (the 3rd field, hence the number "2" below for the number of leading fields to skip) with the word "BOB" using GNU awk:

$ echo "a   b c    d e" |
gawk '{print gensub(/(([^[:space:]]+[[:space:]]+){2})[^[:space:]]+/,"\\1BOB","")}'
a   b BOB    d e

This preserves spacing because you are working on the whole record, not just one field, and so awk won't recompile the record.

So for your case it'd be:

$ cat file1
ALPH      1  M   GIF M   1      11.111  23.123  -4.412  1.00  0.00           A
ALPH      2  BA  GIF M   1      22.222  78.251  -6.215  2.00  0.00           B
ALPH      3  C   GIF M   1      22.223  46.321  -6.124  3.00  0.00           C
ALPH      4  D   GIF M   1      23.333  15.214  -6.125  4.00  0.00           D
ALPH      5  AB  GIF M   1      24.111  61.458  -8.214  5.00  0.00           E
ALPH      6  LM  GIF M   1      25.333  78.214  -9.321  6.00  0.00           F
ALPH      7  BA  GIF M   1      17.645  87.256  -9.365  7.00  0.00           G
ALPH      8  BA2 GIF M   1      14.125  19.365  -1.258  8.00  0.00           H
$          
$ cat file2
-0.14
-0.47
-0.58
-0.69
-0.25
-0.69
-0.12
-0.44
$ 
$ gawk 'NR==FNR{map[FNR]=$0; next} {print gensub(/(([^[:space:]]+[[:space:]]+){10})[^[:space:]]+/,"\\1" map[FNR],"")}' file2 file1
ALPH      1  M   GIF M   1      11.111  23.123  -4.412  1.00  -0.14           A
ALPH      2  BA  GIF M   1      22.222  78.251  -6.215  2.00  -0.47           B
ALPH      3  C   GIF M   1      22.223  46.321  -6.124  3.00  -0.58           C
ALPH      4  D   GIF M   1      23.333  15.214  -6.125  4.00  -0.69           D
ALPH      5  AB  GIF M   1      24.111  61.458  -8.214  5.00  -0.25           E
ALPH      6  LM  GIF M   1      25.333  78.214  -9.321  6.00  -0.69           F
ALPH      7  BA  GIF M   1      17.645  87.256  -9.365  7.00  -0.12           G
ALPH      8  BA2 GIF M   1      14.125  19.365  -1.258  8.00  -0.44           H

If you don't have gawk (for gensub()), you can use match() to find where the field you care about starts, a second match() for where it ends, and judicious substr()s to replace it with the new value.

@GlennJackman mentioned fixed width fields in his solution. If that's what you have you can use GNU awks FIELDWIDTHS variable to specify the width of each field and just work with that. See the gawk manual for details.

like image 190
Ed Morton Avatar answered Oct 29 '25 23:10

Ed Morton