Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Finding common rows in R

While trying to get my data fit for analysis, I can't seem to do this correctly. Presume I have a datasets in this form:

df1

V1  V2df1
a   H
b   Y
c   Y

df2

V1  V2df2
a   Y
j   H
b   Y

and three more (5 datasets of different lengths alltogether). What I am trying to do is the following. First I must find all common elements from the first column(V1) - in this case those are: a,b. Then according to those common elements, I'm trying to build a joined dataset, where values of V1 would be common to all five datasets and values from other columns would be appended in the same row. So to explain with an example, my result should look something like:

V1  V2df1  V2df2
a   H      Y
b   Y      Y

I managed to get some code working, but apperently the results are not correct. What I did: read all the lines from all files into variables(example: a<-df1[,1] and so on) and find common rows like:

red<-Reduce(intersect, list(a,b,c,d,e))

then I filtered specific datasets like:

df1 <-  unique(filter(df1, V1 %in% red))

I ordered every dataset according to row:

df1<-data.frame(df1[with(df1, order(V1)),])

and deleted duplicates(of elements in first column):

df1<- df1[unique(df1$V1),]

I then created a new dataset with:

newdata<-data.frame(V1common=df1[,1], V2df1=df1[,2],V2df2=df2[,2]...)

... means for all five of datasets. I actually got the same number of rows(a good sign since there are the same number of rows within intersection), and then appended other sorted columns, but something doesn't add up. Thanks for any advice. (I omitted the use of libraries and such, the code is for illustrative purposes).

like image 617
sdgaw erzswer Avatar asked May 18 '15 20:05

sdgaw erzswer


People also ask

How do you find common rows in a data frame?

To find the common rows between two DataFrames with merge(), use the parameter “how” as “inner” since it works like SQL Inner Join and this is what we want to achieve.

How do I get specific rows in R?

To get a specific row of a matrix, specify the row number followed by a comma, in square brackets, after the matrix variable name. This expression returns the required row as a vector.

How do I find uncommon rows between two data frames in R?

Instead of finding the common rows, sometimes we need to find the uncommon rows between two data frames. It is mostly used when we expect that a large number of rows are uncommon instead of few ones. We can do this by using the negation operator which is represented by exclamation sign with subset function.


1 Answers

You can use join_all from plyr package

require(plyr)
df <- join_all(list(df1,df2,df3,df4, df5), by = 'V1', type = 'inner')
like image 183
BICube Avatar answered Sep 26 '22 01:09

BICube