Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Replace multiple strings in file using a mapping file

How can I replace multiple strings in one big file ( + 500K lines ) using a mapping file (+ 50K lines) ? The mapping file is structured like this :

A1  B1
A2  B2
A3  B3
..  ..

and the big file is structured like this :

A1  A2
A1  A3
A1  A8
A2  A1
A2  A3
A3  A10
A3  A13

and every string in the big file has to be replace using the mapping file.

Result wanted :

B1  B2
B1  B3
B1  B8
B2  B1
B2  B3
B3  B10
B3  B13

I tried using awk on every line of the mapping file but it takes a very very long time ... Here is the awk command. So I wrote a loop launching for each line of the mapping file an awk command, I save the results in a temporary file and use this result in a new awk with the next line of the mapping file ( not very efficient I know .. )

cat inputBigFile.txt | awk '{ gsub( "A1","B1" );}1' > out.txt

Thanks in advance

like image 499
Nicolas Rosewick Avatar asked May 06 '26 20:05

Nicolas Rosewick


1 Answers

$ awk 'NR==FNR{map[$1]=$2;next} {if($1 in map)$1=map[$1]; if($2 in map)$2=map[$2]}1' mappings file
B1
B1
B1 A8
B2
B2
B3 A10
B3 A13

I assume specifically checking and replacing the two columns to be faster than a loop over NF and/or using gsub.

EDIT: It significantly is:

$ wc -l file
8388608 file

.

$ time awk 'NR==FNR{map[$1]=$2;next} {if($1 in map)$1=map[$1]; if ($2 in map)$2=map[$2]}1' mappings file >/dev/null
real    0m6.941s
user    0m6.904s
sys     0m0.016s

.

$ time awk 'NR==FNR{map[$1]=$2;next} {for(i=1;i<=NF;i++)$i=($i in map)?map[$i]:$i}1' mappings file >/dev/null
real    0m10.311s
user    0m10.249s
sys     0m0.036s

.

$ awk --version | head -n 1
GNU Awk 3.1.8
like image 98
Adrian Frühwirth Avatar answered May 09 '26 16:05

Adrian Frühwirth



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!