I am stuck with an issue which might not seem too difficult to advanced shell users. Here is the problem.
I have 2 files:
File1 with a format like this:
ALPH 1 M GIF M 1 11.111 23.123 -4.412 1.00 0.00 A
ALPH 2 BA GIF M 1 22.222 78.251 -6.215 2.00 0.00 B
ALPH 3 C GIF M 1 22.223 46.321 -6.124 3.00 0.00 C
ALPH 4 D GIF M 1 23.333 15.214 -6.125 4.00 0.00 D
ALPH 5 AB GIF M 1 24.111 61.458 -8.214 5.00 0.00 E
ALPH 6 LM GIF M 1 25.333 78.214 -9.321 6.00 0.00 F
ALPH 7 BA GIF M 1 17.645 87.256 -9.365 7.00 0.00 G
ALPH 8 BA2 GIF M 1 14.125 19.365 -1.258 8.00 0.00 H
-
-
File2 with a single column:
-0.14
-0.47
-0.58
-0.69
-0.25
-0.69
-0.12
-0.44
I want to replace the 11th column in File1 with the only column in File2. I do the following
paste File1 File2 | awk '{$11=$13;$13=""}1' > output
Although it replaces the column just fine, it messes up the original format of File1 which I would like to retain. As you can see that there are different number of spaces between all the fields of File1 and I would like to retain that even after replacing $11.
I have tried several approaches including column and printf but none seem to be working. Maybe I am doing something wrong.
Does anyone know how I can achieve the desired result preferably with awk or sed?
Thanks!
Rohit
When you assign a value to a field in awk, it recompiles the current record using the current value of OFS to separate fields. To retain original spacing, then, you cannot assign a new value to a field. Instead you have to use an RE to describe how many non-space/spaces to skip before and after your assignment. Like this to replace the letter "c" (the 3rd field, hence the number "2" below for the number of leading fields to skip) with the word "BOB" using GNU awk:
$ echo "a b c d e" |
gawk '{print gensub(/(([^[:space:]]+[[:space:]]+){2})[^[:space:]]+/,"\\1BOB","")}'
a b BOB d e
This preserves spacing because you are working on the whole record, not just one field, and so awk won't recompile the record.
So for your case it'd be:
$ cat file1
ALPH 1 M GIF M 1 11.111 23.123 -4.412 1.00 0.00 A
ALPH 2 BA GIF M 1 22.222 78.251 -6.215 2.00 0.00 B
ALPH 3 C GIF M 1 22.223 46.321 -6.124 3.00 0.00 C
ALPH 4 D GIF M 1 23.333 15.214 -6.125 4.00 0.00 D
ALPH 5 AB GIF M 1 24.111 61.458 -8.214 5.00 0.00 E
ALPH 6 LM GIF M 1 25.333 78.214 -9.321 6.00 0.00 F
ALPH 7 BA GIF M 1 17.645 87.256 -9.365 7.00 0.00 G
ALPH 8 BA2 GIF M 1 14.125 19.365 -1.258 8.00 0.00 H
$
$ cat file2
-0.14
-0.47
-0.58
-0.69
-0.25
-0.69
-0.12
-0.44
$
$ gawk 'NR==FNR{map[FNR]=$0; next} {print gensub(/(([^[:space:]]+[[:space:]]+){10})[^[:space:]]+/,"\\1" map[FNR],"")}' file2 file1
ALPH 1 M GIF M 1 11.111 23.123 -4.412 1.00 -0.14 A
ALPH 2 BA GIF M 1 22.222 78.251 -6.215 2.00 -0.47 B
ALPH 3 C GIF M 1 22.223 46.321 -6.124 3.00 -0.58 C
ALPH 4 D GIF M 1 23.333 15.214 -6.125 4.00 -0.69 D
ALPH 5 AB GIF M 1 24.111 61.458 -8.214 5.00 -0.25 E
ALPH 6 LM GIF M 1 25.333 78.214 -9.321 6.00 -0.69 F
ALPH 7 BA GIF M 1 17.645 87.256 -9.365 7.00 -0.12 G
ALPH 8 BA2 GIF M 1 14.125 19.365 -1.258 8.00 -0.44 H
If you don't have gawk (for gensub()), you can use match() to find where the field you care about starts, a second match() for where it ends, and judicious substr()s to replace it with the new value.
@GlennJackman mentioned fixed width fields in his solution. If that's what you have you can use GNU awks FIELDWIDTHS variable to specify the width of each field and just work with that. See the gawk manual for details.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With