Replace specific values based on another dataframe

Tags:

First, let's start with DataFrame 1 (DF1) :

DF1 <- data.frame(c("06/19/2016", "06/20/2016", "06/21/2016", "06/22/2016", 
                    "06/23/2016", "06/19/2016", "06/20/2016", "06/21/2016",
                    "06/22/2016", "06/23/2016"),
                  c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2),
                  c(149, 150, 151, 152, 155, 84, 83, 80, 81, 97),
                  c(101, 102, 104, 107, 99, 55, 55, 56, 57, 58),
                  c("MTL", "MTL", "MTL", "MTL", "MTL", "NY", "NY", 
                    "NY", "NY", "NY"))
colnames(DF1) <- c("date", "id", "sales", "cost", "city")

I also have DataFrame 2 (DF2) :

DF2 <- data.frame(c("06/19/2016", "06/27/2016", "06/22/2016", "06/23/2016"),
                  c(1, 1, 2, 2),
                  c(9999, 8888, 777, 555),
                  c("LON", "LON", "QC", "QC"))
colnames(DF2) <- c("date", "id", "sales", "city")

For every rows in DF1, I have to look if there is a row in DF2 that has the same date and id. If yes, I have to replace the values in DF1 by the values in DF2.

DF2 will always have less columns than DF1. If a column is not in DF2, I must keep the original value that was in DF1 for that specific column.

The final output would like this:

results <- data.frame(c("06/19/2016", "06/20/2016", "06/21/2016", "06/22/2016",
                        "06/23/2016", "06/19/2016", "06/20/2016", "06/21/2016",
                        "06/22/2016", "06/23/2016"),
                      c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2),
                      c(9999, 150, 151, 152, 155, 84, 83, 80, 777, 555),
                      c(101, 102, 104, 107, 99, 55, 55, 56, 57, 58),
                      c("LON", "MTL", "MTL", "MTL", "MTL", "NY", "NY", 
                        "NY", "QC", "QC"))
colnames(results) <- c("date", "id", "sales", "cost", "city")

Do you have any suggestions?

348

asked Jun 23 '16 14:06

Alexis

2 Answers

You could use the join functionality of the data.table-package for this:

library(data.table)
setDT(DF1)
setDT(DF2)

DF1[DF2, on = .(date, id), `:=` (city = i.city, sales = i.sales)]

which gives:

> DF1
          date id sales cost city
 1: 06/19/2016  1  9999  101  LON
 2: 06/20/2016  1   150  102  MTL
 3: 06/21/2016  1   151  104  MTL
 4: 06/22/2016  1   152  107  MTL
 5: 06/23/2016  1   155   99  MTL
 6: 06/19/2016  2    84   55   NY
 7: 06/20/2016  2    83   55   NY
 8: 06/21/2016  2    80   56   NY
 9: 06/22/2016  2   777   57   QC
10: 06/23/2016  2   555   58   QC

When you have many columns in both datasets, it is easier to use mget instead off typing all the column names. For the used data in the question it would look like:

DF1[DF2, on = .(date, id), names(DF2)[3:4] := mget(paste0("i.", names(DF2)[3:4]))]

When you want to construct a vector of columnnames that need to be added beforehand, you could do this as follows:

cols <- names(DF2)[3:4]
DF1[DF2, on = .(date, id), (cols) := mget(paste0("i.", cols))]

186

answered Sep 20 '22 01:09

Jaap

df <- merge(DF1, DF2, by = c("date", "id"), all.x=TRUE)

tmp1 <- df[is.na(df$sales.y) & is.na(df$city.y),]
tmp1$sales.y <- NULL
tmp1$city.y <- NULL
names(tmp1)[names(tmp1) == "sales.x"] <- "sales"
names(tmp1)[names(tmp1) == "city.x"] <- "city"

tmp2 <- df[!is.na(df$sales.y) & !is.na(df$city.y),]
tmp2$sales.x <- NULL
tmp2$city.x <- NULL
names(tmp2)[names(tmp2) == "sales.y"] <- "sales"
names(tmp2)[names(tmp2) == "city.y"] <- "city"

results <- rbindlist(list(tmp1,tmp2), use.names= TRUE, fill = TRUE)

See the result

answered Sep 24 '22 01:09

Moun

Related questions
                            
                                R - loess prediction returns NA
                            
                                How can I screenshot a website using R?
                            
                                Remove entire list elements which contain a certain string
                            
                                street address to geolocation lat/long
                            
                                Find the longest continuous chunk of TRUE in a boolean vector
                            
                                R Markdown add White Space to HTML Output
                            
                                Using regular expressions in R to grab numbers from a string
                            
                                How to know if a vector is composed by the same elements?
                            
                                docker image build "context canceled" error on windows 10
                            
                                Get rid of rows with duplicate attributes in R
                            
                                Can the print() command in R be quieted?
                            
                                Restrict y-axis range on ggplot+geom_density
                            
                                Data analysis using R/python and SSDs
                            
                                Why does plot not respect add = TRUE?
                            
                                Remove point transparency in ggplot2 legend [duplicate]
                            
                                finding point of intersection in R
                            
                                List of data.frame's to individual excel worksheets - R
                            
                                Coalesce two string columns with alternating missing values to one
                            
                                What does negative %IncMSE in RandomForest package mean?
                            
                                Background color of tabs in shiny tabPanel

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Replace specific values based on another dataframe

Tags:

lookup

dataframe

r

Alexis

People also ask

2 Answers

Jaap

Moun

Recent Activity

Donate For Us