I have a dataframe that looks something like:
date minutes_since_midnight value
2015-01-01 50 2
2015-01-01 60 1.5
2015-01-02 45 3.3
2015-01-03 99 5.5
and another dataframe looking something like this
date minutes_since_midnight other_value
2015-01-01 55 12
2015-01-01 80 33
2015-01-02 45 88
What I want to do is add another column to the first data frame, which is the boolean value whether a row exists in the second data frame for an equal value in the date column and then a minutes_since_midnight which is less than or equal to the minutes_since_midnight from the first data frame. So for the above example data we'd get:
date minutes_since_midnight value has_other_value
2015-01-01 50 2 False
2015-01-01 60 1.5 True
2015-01-02 45 3.3 True
2015-01-03 99 5.5 False
How can I do this?
Hope this makes sense,
Thanks in advance
To join two data frames (datasets) vertically, use the rbind function. The two data frames must have the same variables, but they do not have to be in the same order. If data frameA has variables that data frameB does not, then either: Delete the extra variables in data frameA or.
Method 1 : Using plyr package rbind. fill() method in R is an enhancement of the rbind() method in base R, is used to combine data frames with different columns. The column names are number may be different in the input data frames. Missing columns of the corresponding data frames are filled with NA.
We can use the merge() function in base R to perform an outer join, using the 'team' column as the column to join on: What is this? Notice that all of the rows from both data frames are returned.
I would probably join the data.frames along the lines of the other answer, then create the variable and drop unneeded columns. But here's an option using the dplyr
package to perform the steps as you describe them:
library(dplyr)
df1$has_other_value <-
left_join(df1, df2 %>%
group_by(date) %>%
summarise(minMins = min(minutes_since_midnight)),
by="date")$minMins <= df1$minutes_since_midnight
df1$has_other_value[is.na(df1$has_other_value)] <- FALSE
Result:
date minutes_since_midnight value has_other_value
1 2015-01-01 50 2.0 FALSE
2 2015-01-01 60 1.5 TRUE
3 2015-01-02 45 3.3 TRUE
4 2015-01-03 99 5.5 FALSE
Can you not rename the variables minutes_since_midnight to minutes_since_midnight1 and minutes_since_midnight2, merge the two data frames together then create the required has_other_value variable with an if else statement.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With