I have a much larger existing dataframe. For this smaller example I would like to replace some of the variables (replace state (df1)) with newstate (df2) according to column "first." My issue is that values are returned as NA since only some of the names are matched in the new dataframe (df2).
Existing dataframe:
state = c("CA","WA","OR","AZ")
first = c("Jim","Mick","Paul","Ron")
df1 <- data.frame(first, state)
first state
1 Jim CA
2 Mick WA
3 Paul OR
4 Ron AZ
New dataframe to match to existing dataframe
state = c("CA","WA")
newstate = c("TX", "LA")
first =c("Jim","Mick")
df2 <- data.frame(first, state, newstate)
first state newstate
1 Jim CA TX
2 Mick WA LA
Tried to use match but returns NA for "state" where a matching "first" variable from df2 is not found in the original dataframe.
df1$state <- df2$newstate[match(df1$first, df2$first)]
first state
1 Jim TX
2 Mick LA
3 Paul <NA>
4 Ron <NA>
Is there a way to ignore nomatch or have nomatch return the existing variable as-is? This would be example of desired result: Jim/Mick's states are updated while Paul and Ron's state do not change.
first state
1 Jim TX
2 Mick LA
3 Paul OR
4 Ron AZ
The match() function in R is used to: Return the index position of the first element present in a vector with a specific value. Return the index position of the first matching elements of the first vector in the second vector.
R Match – Using match() and %in% to compare vectors The R match () function – returns the indices of common elements. the %in% operator – returns a vector of True / False results which indicates if a value in the first vector was present in the second.
Is this what you want; BTW unless you really want to work with factors, use stringsAsFactors = FALSE in your data.frame call. Notice the use of nomatch = 0 in the match call.
> state = c("CA","WA","OR","AZ")
> first = c("Jim","Mick","Paul","Ron")
> df1 <- data.frame(first, state, stringsAsFactors = FALSE)
> state = c("CA","WA")
> newstate = c("TX", "LA")
> first =c("Jim","Mick")
> df2 <- data.frame(first, state, newstate, stringsAsFactors = FALSE)
> df1
first state
1 Jim CA
2 Mick WA
3 Paul OR
4 Ron AZ
> df2
first state newstate
1 Jim CA TX
2 Mick WA LA
>
> # create an index for the matches
> indx <- match(df1$first, df2$first, nomatch = 0)
> df1$state[indx != 0] <- df2$newstate[indx]
> df1
first state
1 Jim TX
2 Mick LA
3 Paul OR
4 Ron AZ
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With