I have a dataset with many misnamed entries. I created a two column .csv that includes the old (incorrect) names in one column and the corresponding new (correct) names in the second column. Now I need to tell R to replace every old name in the data with the correct name.
testData = data.table(oldName = c("Nu York", "Was DC", "Buston", "Nu York"))
replacements = data.table(oldName = c("Buston", "Nu York", "Was DC"),
newName = c("Boston", "New York", "Washington DC"))
# The next line fails.
holder = replace(testData, testData[, oldName]==replacements[, oldName],
replacements[, newName]
You can use the if else statements to replace multiple categories in a single power query replace value formula. For example, If you have a status column with the values A,I,T and wish to replace them with Active, Inactive, and Terminated use the formula provided below.
To do mass replace in your worksheet, head over to the Ablebits Data tab and click Substring Tools > Replace Substrings. The Replace Substrings dialog box will appear asking you to define the Source range and Substrings range.
Replace text values To open a query, locate one previously loaded from the Power Query Editor, select a cell in the data, and then select Query > Edit. For more information see Create, load, or edit a query in Excel. Select a column with a text data type. Select Home or Transform > Replace Value.
This is how I'd do that replacement:
setkey(testData, oldName)
setkey(replacements, oldName)
testData[replacements, oldName := newName]
testData
# oldName
#1: Boston
#2: New York
#3: New York
#4: Washington DC
You can add an index if you like the original order and put it back in original order at the end.
I reached here looking for a solution and managed to tweak it to my requirement. If original order needs to be maintained then don't use setkey
. I've added mutually exclusive rows on both tables for a better test.
library(data.table)
testData = data.table(
city = c("Nu York", "Was DC", "Buston", "Nu York", "Alabama")
)
If the join by column name in lookup table is same:
replacements = data.table(
city = c("Buston", "Nu York", "Was DC", "tstDummy"),
city_newName = c("Boston", "New York", "Washington DC", "Test Dummy")
)
testData[replacements, city := city_newName, on=.(city)][]
If the join by column name in lookup table is different:
replacements = data.table(
city_oldName = c("Buston", "Nu York", "Was DC", "tstDummy"),
city_newName = c("Boston", "New York", "Washington DC", "Test Dummy")
)
testData[replacements, city := city_newName, on=.(city = city_oldName)][]
Either way, testData
will be changed to:
city
1: New York
2: Washington DC
3: Boston
4: New York
5: Alabama
No keys are made and original order is retained.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With