Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Match and replace many values in data.table

I have a dataset with many misnamed entries. I created a two column .csv that includes the old (incorrect) names in one column and the corresponding new (correct) names in the second column. Now I need to tell R to replace every old name in the data with the correct name.

testData = data.table(oldName = c("Nu York", "Was DC", "Buston",  "Nu York"))
replacements = data.table(oldName = c("Buston", "Nu York", "Was DC"), 
    newName = c("Boston", "New York", "Washington DC"))

    # The next line fails.
holder = replace(testData, testData[, oldName]==replacements[, oldName], 
    replacements[, newName]
like image 997
Dr. Beeblebrox Avatar asked Mar 12 '14 14:03

Dr. Beeblebrox


People also ask

How do you replace multiple values with one Value in Power query?

You can use the if else statements to replace multiple categories in a single power query replace value formula. For example, If you have a status column with the values A,I,T and wish to replace them with Active, Inactive, and Terminated use the formula provided below.

How to replace values in bulk in Excel?

To do mass replace in your worksheet, head over to the Ablebits Data tab and click Substring Tools > Replace Substrings. The Replace Substrings dialog box will appear asking you to define the Source range and Substrings range.

How to replace all values in a column in Power query?

Replace text values To open a query, locate one previously loaded from the Power Query Editor, select a cell in the data, and then select Query > Edit. For more information see Create, load, or edit a query in Excel. Select a column with a text data type. Select Home or Transform > Replace Value.


2 Answers

This is how I'd do that replacement:

setkey(testData, oldName)
setkey(replacements, oldName)

testData[replacements, oldName := newName]
testData
#         oldName
#1:        Boston
#2:      New York
#3:      New York
#4: Washington DC

You can add an index if you like the original order and put it back in original order at the end.

like image 58
eddi Avatar answered Oct 31 '22 15:10

eddi


I reached here looking for a solution and managed to tweak it to my requirement. If original order needs to be maintained then don't use setkey. I've added mutually exclusive rows on both tables for a better test.

library(data.table)

testData = data.table(
  city = c("Nu York", "Was DC", "Buston",  "Nu York", "Alabama")
)

If the join by column name in lookup table is same:

replacements = data.table(
  city = c("Buston", "Nu York", "Was DC", "tstDummy"), 
  city_newName = c("Boston", "New York", "Washington DC", "Test Dummy")
)

testData[replacements, city := city_newName, on=.(city)][]

If the join by column name in lookup table is different:

replacements = data.table(
  city_oldName = c("Buston", "Nu York", "Was DC", "tstDummy"), 
  city_newName = c("Boston", "New York", "Washington DC", "Test Dummy")
)

testData[replacements, city := city_newName, on=.(city = city_oldName)][]

Either way, testData will be changed to:

            city
1:      New York
2: Washington DC
3:        Boston
4:      New York
5:       Alabama

No keys are made and original order is retained.

like image 33
San Avatar answered Oct 31 '22 15:10

San