Having <code>df1</code> and <code>df2</code> as follows: <pre class="prettyprint"><code>df1 <- read.table(text =" x y z 1 1 1 1 2 1 1 1 2 2 1 1 2 2 2",header=TRUE) df2 <- read.table(text =" a b c 1 1 1 1 2 8 1 1 2 2 6 2",header=TRUE) </code></pre> I can ask of the data a bunch of things like: <pre class="prettyprint"><code> df2[ df2$b == 6 | df2$c == 8 ,] #any rows where b=6 plus c=8 in df2 #and additive conditions df2[ df2$b == 6 & df2$c == 8 ,] # zero rows </code></pre> between data.frame: <pre class="prettyprint"><code> df1[ df1$z %in% df2$c ,] # rows in df1 where values in z are in c (allrows) </code></pre> This gives me all rows: <pre class="prettyprint"><code> df1[ (df1$x %in% df2$a) & (df1$y %in% df2$b) & (df1$z %in% df2$c) ,] </code></pre> but shouldn't this give me all rows of <code>df1</code> too: <pre class="prettyprint"><code> df1[ df1$z %in% df2$c | df1$b == 9,] </code></pre> What I am really hoping to do is to subset <code>df1</code> an <code>df2</code> on three column conditions, so that I only get rows in df1 where a,b,c all equal x,y,z at the same time within a row. In real data i will have more than 3 columns but I will still want to subset on 3 additive column conditions. So subsetting my example data <code>df1</code> on <code>df2</code> my result would be: <pre class="prettyprint"><code>df1 1 1 1 1 1 2 </code></pre> Playing with syntax has confusedme more and the SO posts are all variaion of what I want that actually lead to more confusion for me. I figured out I can do this: <pre class="prettyprint"><code> merge(df1,df2, by.x=c("x","y","z"),by.y=c("a","b","c")) </code></pre> which gives me what I want, but I would like to understand why I am wrong in my <code>[</code> attempts.

In addition to your nice solution using <code>merge</code> (thanks for that, I always forget <code>merge</code>), this can be achieved in base using <code>?interaction</code> as follows. There may be other variations of this, but this is the one I am familiar with: <pre class="prettyprint"><code>> df1[interaction(df1) %in% interaction(df2), ] </code></pre> Now to answer your question: First, I think there's a typo (corrected) in: <pre class="prettyprint"><code>df1[ df1$z %in% df2$c | df2$b == 9,] # second part should be df2$b == 9 </code></pre> You would get an error, because the first part evaluates to <pre class="prettyprint"><code>[1] TRUE TRUE TRUE TRUE TRUE </code></pre> and the second evaluates to: <pre class="prettyprint"><code>[1] FALSE FALSE FALSE FALSE </code></pre> You do a <code>|</code> operation on unequal lengths getting the error: <pre class="prettyprint"><code>longer object length is not a multiple of shorter object length </code></pre> Edit: If you have multiple columns then you can choose the interaction as such. For example, if you want to get from <code>df1</code> the rows where the first two columns match with that of <code>df2</code>, then you could simply do: <pre class="prettyprint"><code>> df1[interaction(df1[, 1:2]) %in% interaction(df2[, 1:2]), ] </code></pre>

the rules of subsetting

Tags:

dataframe

r

subset

Having df1 and df2 as follows:

df1 <- read.table(text =" x y z
                          1 1 1
                          1 2 1
                          1 1 2
                          2 1 1
                          2 2 2",header=TRUE)

df2 <- read.table(text =" a b c
                          1 1 1
                          1 2 8
                          1 1 2
                          2 6 2",header=TRUE)

I can ask of the data a bunch of things like:

 df2[ df2$b == 6 | df2$c == 8 ,] #any rows where b=6 plus c=8 in df2
 #and additive conditions
 df2[ df2$b == 6 & df2$c == 8 ,] # zero rows

between data.frame:

 df1[ df1$z %in% df2$c ,] # rows in df1 where values in z are in c (allrows)

This gives me all rows:

 df1[ (df1$x %in%  df2$a) &
      (df1$y %in%  df2$b) &
      (df1$z %in%  df2$c) ,]

but shouldn't this give me all rows of df1 too:

 df1[ df1$z %in% df2$c | df1$b == 9,]

What I am really hoping to do is to subset df1 an df2 on three column conditions, so that I only get rows in df1 where a,b,c all equal x,y,z at the same time within a row. In real data i will have more than 3 columns but I will still want to subset on 3 additive column conditions.

So subsetting my example data df1 on df2 my result would be:

df1
   1 1 1
   1 1 2

Playing with syntax has confusedme more and the SO posts are all variaion of what I want that actually lead to more confusion for me.

I figured out I can do this:

 merge(df1,df2, by.x=c("x","y","z"),by.y=c("a","b","c"))

which gives me what I want, but I would like to understand why I am wrong in my [ attempts.

937

asked Jan 30 '13 11:01

user1320502

1 Answers

In addition to your nice solution using merge (thanks for that, I always forget merge), this can be achieved in base using ?interaction as follows. There may be other variations of this, but this is the one I am familiar with:

> df1[interaction(df1) %in% interaction(df2), ]

Now to answer your question: First, I think there's a typo (corrected) in:

df1[ df1$z %in% df2$c | df2$b == 9,] # second part should be df2$b == 9

You would get an error, because the first part evaluates to

[1] TRUE TRUE TRUE TRUE TRUE

and the second evaluates to:

[1] FALSE FALSE FALSE FALSE

You do a | operation on unequal lengths getting the error:

longer object length is not a multiple of shorter object length

Edit: If you have multiple columns then you can choose the interaction as such. For example, if you want to get from df1 the rows where the first two columns match with that of df2, then you could simply do:

> df1[interaction(df1[, 1:2]) %in% interaction(df2[, 1:2]), ]

130

answered Sep 22 '22 15:09

Arun

Related questions
                            
                                Visually Inspecting Data in R
                            
                                loop R multiple samples from single dataset
                            
                                Looping through covariates in regression using R
                            
                                How to get optim working with matrix multiplication inside the function to be maximized in R
                            
                                Importing Data with Shiny and RStudio
                            
                                What does the following error mean: TopologyException: found non-nonded intersection between LINESTRING
                            
                                R::bigmemory - how to create character big.matrix?
                            
                                Get data in R from an ESRI v10 Geodatabase
                            
                                Access outlier ids in lme plot
                            
                                Smooth a binary variable using moving average or kernel smoothing
                            
                                Negative binomial in GEE
                            
                                Merge data frames from a list with each other
                            
                                Recursive Lists
                            
                                Method to copy down rows R
                            
                                weighted means by group and column
                            
                                Poor SVG quality compared to TIFF
                            
                                Issues using ggplot2 aes_string with box plots
                            
                                Missing Rows from Dataset in R
                            
                                Having trouble to use the plyr package and working with lists
                            
                                regex match on R gregexpr

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With