Can I define a "fill" value for NA in dplyr join? For example in the join define that all NA values should be 1?
require(dplyr) lookup <- data.frame(cbind(c("USD","MYR"),c(0.9,1.1))) names(lookup) <- c("rate","value") fx <- data.frame(c("USD","MYR","USD","MYR","XXX","YYY")) names(fx)[1] <- "rate" left_join(x=fx,y=lookup,by=c("rate"))
Above code will create NA for values "XXX" and "YYY". In my case I am joining a large number of columns and there will be a lot of non-matches. All non-matches should have the same value. I know I can do it in several steps but the question is can all be done in one? Thanks!
The beauty of dplyr is that it handles four types of joins similar to SQL: left_join() – To merge two datasets and keep all observations from the origin table. right_join() – To merge two datasets and keep all observations from the destination table. inner_join() – To merge two datasets and exclude all unmatched rows.
You can replace NA values with zero(0) on numeric columns of R data frame by using is.na() , replace() , imputeTS::replace() , dplyr::coalesce() , dplyr::mutate_at() , dplyr::mutate_if() , and tidyr::replace_na() functions.
To join by different variables on x and y , use a named vector. For example, by = c("a" = "b") will match x$a to y$b . To join by multiple variables, use a vector with length > 1. For example, by = c("a", "b") will match x$a to y$a and x$b to y$b .
That means if we have a column which has some missing values then replace it with the mean of the remaining values. In R, we can do this by replacing the column with missing values using mean of that column and passing na. rm = TRUE argument along with the same.
First off, I would like to recommend not to use the combination data.frame(cbind(...))
. Here's why: cbind
creates a matrix
by default if you only pass atomic vectors to it. And matrices in R can only have one type of data (think of matrices as a vector with dimension attribute, i.e. number of rows and columns). Therefore, your code
cbind(c("USD","MYR"),c(0.9,1.1))
creates a character matrix:
str(cbind(c("USD","MYR"),c(0.9,1.1))) # chr [1:2, 1:2] "USD" "MYR" "0.9" "1.1"
although you probably expected a final data frame with a character or factor column (rate) and a numeric column (value). But what you get is:
str(data.frame(cbind(c("USD","MYR"),c(0.9,1.1)))) #'data.frame': 2 obs. of 2 variables: # $ X1: Factor w/ 2 levels "MYR","USD": 2 1 # $ X2: Factor w/ 2 levels "0.9","1.1": 1 2
because strings (characters) are converted to factors when using data.frame
by default (You can circumvent this by specifying stringsAsFactors = FALSE
in the data.frame()
call).
I suggest the following alternative approach to create the sample data (also note that you can easily specify the column names in the same call):
lookup <- data.frame(rate = c("USD","MYR"), value = c(0.9,1.1)) fx <- data.frame(rate = c("USD","MYR","USD","MYR","XXX","YYY"))
Now, for you actual question, if I understand correctly, you want to replace all NA
s with a 1
in the joined data. If that's correct, here's a custom function using left_join
and mutate_each
to do that:
library(dplyr) left_join_NA <- function(x, y, ...) { left_join(x = x, y = y, by = ...) %>% mutate_each(funs(replace(., which(is.na(.)), 1))) }
Now you can apply it to your data like this:
> left_join_NA(x = fx, y = lookup, by = "rate") # rate value #1 USD 0.9 #2 MYR 1.1 #3 USD 0.9 #4 MYR 1.1 #5 XXX 1.0 #6 YYY 1.0 #Warning message: #joining factors with different levels, coercing to character vector
Note that you end up with a character column (rate) and a numeric column (value) and all NAs are replaced by 1.
str(left_join_NA(x = fx, y = lookup, by = "rate")) #'data.frame': 6 obs. of 2 variables: # $ rate : chr "USD" "MYR" "USD" "MYR" ... # $ value: num 0.9 1.1 0.9 1.1 1 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With