How to merge two data.frames together in R, referencing a lookup table

Tags:

I am trying to merge two data.frames together, based on a common column name in each of them called series_id. Here is my merge statement:

merge(test_growth_series_LUT,  test_growth_series, by = intersect(series_id, series_id))

The error I'm getting is

Error in as.vector(y) : object 'series_id' not found

The help gives this description, but I can't see why it can't find the series_id. Example data is below.

### S3 method for class 'data.frame':
   #merge(x, y, by = intersect(names(x), names(y)),
   #      by.x = by, by.y = by, all = FALSE, all.x = all, all.y = all,
   #      sort = TRUE, suffixes = c(".x",".y"), ...)



# Create a long data.frame to store data...
test_growth_series = data.frame ("read_day" = c(0, 3, 9, 0, 3, 9, 0, 2, 8), 
"series_id" = c("p1s1", "p1s1", "p1s1", "p1s2", "p1s2", "p1s2", "p3s4", "p3s4", "p3s4"),
"mean_od" = c(0.6, 0.9, 1.3, 0.3, 0.6, 1.0, 0.2, 0.5, 1.2),
"sd_od" = c(0.1, 0.2, 0.2, 0.1, 0.1, 0.3, 0.04, 0.1, 0.3),
"n_in_stat" = c(8, 8, 8, 8, 7, 5, 8, 7, 2)
)

# Create a name LUT
test_growth_series_LUT = data.frame ("series_id" = c("p1s1", "p1s2", "p3s4", "p4s2", "p5s2", "p6s2", "p7s4", "p8s4", "p9s4"),"description" = c("blah1", "blah2", "blah3", "blah4", "blah5", "blah6", "blah7", "blah8", "blah9")
)

> test_growth_series
  read_day series_id mean_od sd_od n_in_stat
1        0      p1s1     0.6  0.10         8
2        3      p1s1     0.9  0.20         8
3        9      p1s1     1.3  0.20         8
4        0      p1s2     0.3  0.10         8
5        3      p1s2     0.6  0.10         7
6        9      p1s2     1.0  0.30         5
7        0      p3s4     0.2  0.04         8
8        2      p3s4     0.5  0.10         7
9        8      p3s4     1.2  0.30         2
> test_growth_series_LUT
  series_id description
1      p1s1       blah1
2      p1s2       blah2
3      p3s4       blah3
4      p4s2       blah4
5      p5s2       blah5
6      p6s2       blah6
7      p7s4       blah7
8      p8s4       blah8
9      p9s4       blah9
> 



this is what I'm trying to achieve:  
> new_test_growth_series
  read_day series_id mean_od sd_od n_in_stat        description
1        0      p1s1     0.6  0.10         8        blah1
2        3      p1s1     0.9  0.20         8        blah1
3        9      p1s1     1.3  0.20         8        blah1
4        0      p1s2     0.3  0.10         8        blah2
5        3      p1s2     0.6  0.10         7        blah2
6        9      p1s2     1.0  0.30         5        blah2
7        0      p3s4     0.2  0.04         8        blah3
8        2      p3s4     0.5  0.10         7        blah3
9        8      p3s4     1.2  0.30         2        blah3

506

asked Feb 28 '10 21:02

John

1 Answers

You can just do this:

merge(test_growth_series_LUT, test_growth_series)

It will automatically match the names. If you need to specify the column, you do it like this:

merge(test_growth_series_LUT, test_growth_series, by = "series_id")

Or this way if you need to specify on both sides (only needed if they have different names that you want to match on):

merge(test_growth_series_LUT, test_growth_series, by.x = "series_id", by.y = "series_id")

I recommend looking at the examples (and walking through them) by going to the help for merge (?merge) or by calling example("merge", "base") (less useful that actually walking through it yourself.

Two notes:

You would never need to use the intersect function here. Use c() to specify multiple column names explicitly. Or use the all, all.x, and all.y parameters to specify what kind of join you want.
You would use quotes to specify a column name in most cases unless you have attached the data. Otherwise it will complain about not being able to locate the name. In particular, the name needs to be in the search path when you aren't using quotes.

answered Sep 20 '22 04:09

Shane

Related questions
                            
                                R - cannot find -llapack & cannot find -lblas
                            
                                Creating a correlation matrix from a data frame in R
                            
                                Using lapply over a list and adding a column with data frame name
                            
                                Count occurences of lists efficiently
                            
                                How to subtract two comma separated columns in R?
                            
                                Non-linear optimisation/programming with integer variables in R
                            
                                How to use submenu in rmarkdown navbar?
                            
                                R ggplot2 - legend at the bottom gets cut, how to find optimal number of columns for the legend on the fly?
                            
                                Different behavior of base R gsub and stringr::str_replace_all?
                            
                                Why does Rccp return a list-like output when I was expecting a dataframe output in R?
                            
                                R: Count frequency of values in nested list with sub-elements
                            
                                Troubleshooting 'Tool(s) not installed or not in PATH: ghostcript' warning in RStudio
                            
                                changing column names of a data frame by changing values - R
                            
                                function to track the changes in a field
                            
                                How to change NA into 0 based on other variable / how many times it was recorded
                            
                                Replace NAs with missing values in sequence (R)
                            
                                Is it possible to have `dput` return source code that would run outside of the enclosing environment?
                            
                                Should "while loops" be preferred to "for loops" for large, necessary loops in R?
                            
                                Draw hyperplane in R?
                            
                                understanding dates/times (POSIXc and POSIXct) in R

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to merge two data.frames together in R, referencing a lookup table

Tags:

merge

dataframe

r

John

People also ask

1 Answers

Shane

Recent Activity

Donate For Us