Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to merge two data.frames together in R, referencing a lookup table

Tags:

merge

dataframe

r

I am trying to merge two data.frames together, based on a common column name in each of them called series_id. Here is my merge statement:

merge(test_growth_series_LUT,  test_growth_series, by = intersect(series_id, series_id))

The error I'm getting is

Error in as.vector(y) : object 'series_id' not found

The help gives this description, but I can't see why it can't find the series_id. Example data is below.

### S3 method for class 'data.frame':
   #merge(x, y, by = intersect(names(x), names(y)),
   #      by.x = by, by.y = by, all = FALSE, all.x = all, all.y = all,
   #      sort = TRUE, suffixes = c(".x",".y"), ...)



# Create a long data.frame to store data...
test_growth_series = data.frame ("read_day" = c(0, 3, 9, 0, 3, 9, 0, 2, 8), 
"series_id" = c("p1s1", "p1s1", "p1s1", "p1s2", "p1s2", "p1s2", "p3s4", "p3s4", "p3s4"),
"mean_od" = c(0.6, 0.9, 1.3, 0.3, 0.6, 1.0, 0.2, 0.5, 1.2),
"sd_od" = c(0.1, 0.2, 0.2, 0.1, 0.1, 0.3, 0.04, 0.1, 0.3),
"n_in_stat" = c(8, 8, 8, 8, 7, 5, 8, 7, 2)
)

# Create a name LUT
test_growth_series_LUT = data.frame ("series_id" = c("p1s1", "p1s2", "p3s4", "p4s2", "p5s2", "p6s2", "p7s4", "p8s4", "p9s4"),"description" = c("blah1", "blah2", "blah3", "blah4", "blah5", "blah6", "blah7", "blah8", "blah9")
)

> test_growth_series
  read_day series_id mean_od sd_od n_in_stat
1        0      p1s1     0.6  0.10         8
2        3      p1s1     0.9  0.20         8
3        9      p1s1     1.3  0.20         8
4        0      p1s2     0.3  0.10         8
5        3      p1s2     0.6  0.10         7
6        9      p1s2     1.0  0.30         5
7        0      p3s4     0.2  0.04         8
8        2      p3s4     0.5  0.10         7
9        8      p3s4     1.2  0.30         2
> test_growth_series_LUT
  series_id description
1      p1s1       blah1
2      p1s2       blah2
3      p3s4       blah3
4      p4s2       blah4
5      p5s2       blah5
6      p6s2       blah6
7      p7s4       blah7
8      p8s4       blah8
9      p9s4       blah9
> 



this is what I'm trying to achieve:  
> new_test_growth_series
  read_day series_id mean_od sd_od n_in_stat        description
1        0      p1s1     0.6  0.10         8        blah1
2        3      p1s1     0.9  0.20         8        blah1
3        9      p1s1     1.3  0.20         8        blah1
4        0      p1s2     0.3  0.10         8        blah2
5        3      p1s2     0.6  0.10         7        blah2
6        9      p1s2     1.0  0.30         5        blah2
7        0      p3s4     0.2  0.04         8        blah3
8        2      p3s4     0.5  0.10         7        blah3
9        8      p3s4     1.2  0.30         2        blah3
like image 506
John Avatar asked Feb 28 '10 21:02

John


People also ask

How do I merge two data frames in R?

In R we use merge() function to merge two dataframes in R. This function is present inside join() function of dplyr package. The most important condition for joining two dataframes is that the column type should be the same on which the merging happens.

How do I combine two data frames in R based on column?

The merge() function in base R can be used to merge input dataframes by common columns or row names. The merge() function retains all the row names of the dataframes, behaving similarly to the inner join. The dataframes are combined in order of the appearance in the input function call.

How do I combine data values in R?

The simplest way to group together values is with the function c() . Feel free to refer to this function however you like, but the words concatenate, combine, and collect are all good options.


1 Answers

You can just do this:

merge(test_growth_series_LUT, test_growth_series)

It will automatically match the names. If you need to specify the column, you do it like this:

merge(test_growth_series_LUT, test_growth_series, by = "series_id")

Or this way if you need to specify on both sides (only needed if they have different names that you want to match on):

merge(test_growth_series_LUT, test_growth_series, by.x = "series_id", by.y = "series_id")

I recommend looking at the examples (and walking through them) by going to the help for merge (?merge) or by calling example("merge", "base") (less useful that actually walking through it yourself.

Two notes:

  1. You would never need to use the intersect function here. Use c() to specify multiple column names explicitly. Or use the all, all.x, and all.y parameters to specify what kind of join you want.
  2. You would use quotes to specify a column name in most cases unless you have attached the data. Otherwise it will complain about not being able to locate the name. In particular, the name needs to be in the search path when you aren't using quotes.
like image 90
Shane Avatar answered Sep 20 '22 04:09

Shane