I am trying to replace values in a large space-delimited text-file and could not find a suitable answer for this specific problem:
Say I have a file "OLD_FILE", containing a header and approximately 2 million rows:
COL1 COL2 COL3 COL4 COL5
rs10 7 92221824 C A
rs1000000 12 125456933 G A
rs10000010 4 21227772 T C
rs10000012 4 1347325 G C
rs10000013 4 36901464 C A
rs10000017 4 84997149 T C
rs1000002 3 185118462 T C
rs10000023 4 95952929 T G
...
I want to replace the first value of each row with a corresponding value, using a large (2.8M rows) conversion table. In this conversion table, the first column lists the value I want to have replaced, and the second column lists the corresponding new values:
COL1_b36 COL2_b37
rs10 7_92383888
rs1000000 12_126890980
rs10000010 4_21618674
rs10000012 4_1357325
rs10000013 4_37225069
rs10000017 4_84778125
rs1000002 3_183635768
rs10000023 4_95733906
...
The desired output would be a file where all values in the first column have been changed according to the conversion table:
COL1 COL2 COL3 COL4 COL5
7_92383888 7 92221824 C A
12_126890980 12 125456933 G A
4_21618674 4 21227772 T C
4_1357325 4 1347325 G C
4_37225069 4 36901464 C A
4_84778125 4 84997149 T C
3_183635768 3 185118462 T C
4_95733906 4 95952929 T G
...
Additional info:
Performance is an issue (the following command takes approximately a year:
while read a b; do sed -i "s/\b$a\b/$b/g" OLD_FILE ; done < CONVERSION_TABLE
Any help is very much appreciated.
Right-click on a column -> Select Replace Values 2. Go to the Transform tab -> click on Replace Values 3. You can right-click a value within a column and click on Replace Values After any of the 3 steps, the Replace Values pop-up screen appears. 1.2. Arguments Table.ReplaceValue Now let’s have a look at how replacing a value works.
In its most basic form, replacing values in Power Query is easy. When you want to replace values in a column, you can either: 1. Right-click on a column -> Select Replace Values
To do that, you can select multiple columns by holding the Ctrl-key and clicking on each column you need. With your columns selected, right click on the column name -> press Replace Values. If your selected columns are all text, the code generated for the replacements looks like:
Also converting values within an internal table in one go could reduce the number of code lines a developer needs to write. The example given below will give you a clear understanding on how we could implement a common method to do value conversions for any given data type.
Here's one way using awk
:
awk 'NR==1 { next } FNR==NR { a[$1]=$2; next } $1 in a { $1=a[$1] }1' TABLE OLD_FILE
Results:
COL1 COL2 COL3 COL4 COL5
7_92383888 7 92221824 C A
12_126890980 12 125456933 G A
4_21618674 4 21227772 T C
4_1357325 4 1347325 G C
4_37225069 4 36901464 C A
4_84778125 4 84997149 T C
3_183635768 3 185118462 T C
4_95733906 4 95952929 T G
Explanation, in order of appearance:
NR==1 { next } # simply skip processing the first line (header) of
# the first file in the arguments list (TABLE)
FNR==NR { ... } # This is a construct that only returns true for the
# first file in the arguments list (TABLE)
a[$1]=$2 # So when we loop through the TABLE file, we add the
# column one to an associative array, and we assign
# this key the value of column two
next # This simply skips processing the remainder of the
# code by forcing awk to read the next line of input
$1 in a { ... } # Now when awk has finished processing the TABLE file,
# it will begin reading the second file in the
# arguments list which is OLD_FILE. So this construct
# is a condition that returns true literally if column
# one exists in the array
$1=a[$1] # re-assign column one's value to be the value held
# in the array
1 # The 1 on the end simply enables default printing. It
# would be like saying: $1 in a { $1=a[$1]; print $0 }'
This might work for you (GNU sed):
sed -r '1d;s|(\S+)\s*(\S+).*|/^\1\\>/s//\2/;t|' table | sed -f - file
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With