I am trying to merge two data.frames
together, based on a common column name in each of them called series_id
. Here is my merge statement:
merge(test_growth_series_LUT, test_growth_series, by = intersect(series_id, series_id))
The error I'm getting is
Error in as.vector(y) : object 'series_id' not found
The help gives this description, but I can't see why it can't find the series_id
. Example data is below.
### S3 method for class 'data.frame':
#merge(x, y, by = intersect(names(x), names(y)),
# by.x = by, by.y = by, all = FALSE, all.x = all, all.y = all,
# sort = TRUE, suffixes = c(".x",".y"), ...)
# Create a long data.frame to store data...
test_growth_series = data.frame ("read_day" = c(0, 3, 9, 0, 3, 9, 0, 2, 8),
"series_id" = c("p1s1", "p1s1", "p1s1", "p1s2", "p1s2", "p1s2", "p3s4", "p3s4", "p3s4"),
"mean_od" = c(0.6, 0.9, 1.3, 0.3, 0.6, 1.0, 0.2, 0.5, 1.2),
"sd_od" = c(0.1, 0.2, 0.2, 0.1, 0.1, 0.3, 0.04, 0.1, 0.3),
"n_in_stat" = c(8, 8, 8, 8, 7, 5, 8, 7, 2)
)
# Create a name LUT
test_growth_series_LUT = data.frame ("series_id" = c("p1s1", "p1s2", "p3s4", "p4s2", "p5s2", "p6s2", "p7s4", "p8s4", "p9s4"),"description" = c("blah1", "blah2", "blah3", "blah4", "blah5", "blah6", "blah7", "blah8", "blah9")
)
> test_growth_series
read_day series_id mean_od sd_od n_in_stat
1 0 p1s1 0.6 0.10 8
2 3 p1s1 0.9 0.20 8
3 9 p1s1 1.3 0.20 8
4 0 p1s2 0.3 0.10 8
5 3 p1s2 0.6 0.10 7
6 9 p1s2 1.0 0.30 5
7 0 p3s4 0.2 0.04 8
8 2 p3s4 0.5 0.10 7
9 8 p3s4 1.2 0.30 2
> test_growth_series_LUT
series_id description
1 p1s1 blah1
2 p1s2 blah2
3 p3s4 blah3
4 p4s2 blah4
5 p5s2 blah5
6 p6s2 blah6
7 p7s4 blah7
8 p8s4 blah8
9 p9s4 blah9
>
this is what I'm trying to achieve:
> new_test_growth_series
read_day series_id mean_od sd_od n_in_stat description
1 0 p1s1 0.6 0.10 8 blah1
2 3 p1s1 0.9 0.20 8 blah1
3 9 p1s1 1.3 0.20 8 blah1
4 0 p1s2 0.3 0.10 8 blah2
5 3 p1s2 0.6 0.10 7 blah2
6 9 p1s2 1.0 0.30 5 blah2
7 0 p3s4 0.2 0.04 8 blah3
8 2 p3s4 0.5 0.10 7 blah3
9 8 p3s4 1.2 0.30 2 blah3
In R we use merge() function to merge two dataframes in R. This function is present inside join() function of dplyr package. The most important condition for joining two dataframes is that the column type should be the same on which the merging happens.
The merge() function in base R can be used to merge input dataframes by common columns or row names. The merge() function retains all the row names of the dataframes, behaving similarly to the inner join. The dataframes are combined in order of the appearance in the input function call.
The simplest way to group together values is with the function c() . Feel free to refer to this function however you like, but the words concatenate, combine, and collect are all good options.
You can just do this:
merge(test_growth_series_LUT, test_growth_series)
It will automatically match the names. If you need to specify the column, you do it like this:
merge(test_growth_series_LUT, test_growth_series, by = "series_id")
Or this way if you need to specify on both sides (only needed if they have different names that you want to match on):
merge(test_growth_series_LUT, test_growth_series, by.x = "series_id", by.y = "series_id")
I recommend looking at the examples (and walking through them) by going to the help for merge (?merge
) or by calling example("merge", "base")
(less useful that actually walking through it yourself.
Two notes:
c()
to specify multiple column names explicitly. Or use the all
, all.x
, and all.y
parameters to specify what kind of join you want.If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With