While trying to get my data fit for analysis, I can't seem to do this correctly. Presume I have a datasets in this form: <pre class="prettyprint"><code>df1 V1 V2df1 a H b Y c Y df2 V1 V2df2 a Y j H b Y </code></pre> and three more (5 datasets of different lengths alltogether). What I am trying to do is the following. First I must find all common elements from the first column(V1) - in this case those are: a,b. Then according to those common elements, I'm trying to build a joined dataset, where values of V1 would be common to all five datasets and values from other columns would be appended in the same row. So to explain with an example, my result should look something like: <pre class="prettyprint"><code>V1 V2df1 V2df2 a H Y b Y Y </code></pre> I managed to get some code working, but apperently the results are not correct. What I did: read all the lines from all files into variables(example: <code>a<-df1[,1]</code> and so on) and find common rows like: <pre class="prettyprint"><code>red<-Reduce(intersect, list(a,b,c,d,e)) </code></pre> then I filtered specific datasets like: <pre class="prettyprint"><code>df1 <- unique(filter(df1, V1 %in% red)) </code></pre> I ordered every dataset according to row: <pre class="prettyprint"><code>df1<-data.frame(df1[with(df1, order(V1)),]) </code></pre> and deleted duplicates(of elements in first column): <pre class="prettyprint"><code>df1<- df1[unique(df1$V1),] </code></pre> I then created a new dataset with: <pre class="prettyprint"><code>newdata<-data.frame(V1common=df1[,1], V2df1=df1[,2],V2df2=df2[,2]...) </code></pre> ... means for all five of datasets. I actually got the same number of rows(a good sign since there are the same number of rows within intersection), and then appended other sorted columns, but something doesn't add up. Thanks for any advice. (I omitted the use of libraries and such, the code is for illustrative purposes).

You can use <code>join_all</code> from <code>plyr</code> package <pre class="prettyprint"><code>require(plyr) df <- join_all(list(df1,df2,df3,df4, df5), by = 'V1', type = 'inner') </code></pre>

Finding common rows in R

Q: How do you find common rows in a data frame?

To find the common rows between two DataFrames with merge(), use the parameter “how” as “inner” since it works like SQL Inner Join and this is what we want to achieve.

Q: How do I get specific rows in R?

To get a specific row of a matrix, specify the row number followed by a comma, in square brackets, after the matrix variable name. This expression returns the required row as a vector.

Q: How do I find uncommon rows between two data frames in R?

Instead of finding the common rows, sometimes we need to find the uncommon rows between two data frames. It is mostly used when we expect that a large number of rows are uncommon instead of few ones. We can do this by using the negation operator which is represented by exclamation sign with subset function.

Tags:

dataframe

r

unique

While trying to get my data fit for analysis, I can't seem to do this correctly. Presume I have a datasets in this form:

Click to copy

df1

V1  V2df1
a   H
b   Y
c   Y

df2

V1  V2df2
a   Y
j   H
b   Y

and three more (5 datasets of different lengths alltogether). What I am trying to do is the following. First I must find all common elements from the first column(V1) - in this case those are: a,b. Then according to those common elements, I'm trying to build a joined dataset, where values of V1 would be common to all five datasets and values from other columns would be appended in the same row. So to explain with an example, my result should look something like:

Click to copy

V1  V2df1  V2df2
a   H      Y
b   Y      Y

I managed to get some code working, but apperently the results are not correct. What I did: read all the lines from all files into variables(example: a<-df1[,1] and so on) and find common rows like:

Click to copy

red<-Reduce(intersect, list(a,b,c,d,e))

then I filtered specific datasets like:

Click to copy

df1 <-  unique(filter(df1, V1 %in% red))

I ordered every dataset according to row:

Click to copy

df1<-data.frame(df1[with(df1, order(V1)),])

and deleted duplicates(of elements in first column):

Click to copy

df1<- df1[unique(df1$V1),]

I then created a new dataset with:

Click to copy

newdata<-data.frame(V1common=df1[,1], V2df1=df1[,2],V2df2=df2[,2]...)

... means for all five of datasets. I actually got the same number of rows(a good sign since there are the same number of rows within intersection), and then appended other sorted columns, but something doesn't add up. Thanks for any advice. (I omitted the use of libraries and such, the code is for illustrative purposes).

617

asked May 18 '15 20:05

sdgaw erzswer

1 Answers

You can use join_all from plyr package

Click to copy

require(plyr)
df <- join_all(list(df1,df2,df3,df4, df5), by = 'V1', type = 'inner')

183

answered Sep 26 '22 01:09

BICube

Related questions
                            
                                loading ggplot2 (colorspace, actually) opens up x11
                            
                                Visualizing graph/network with 3 layeres (tripartite) in R/igraph
                            
                                Correlation between numeric and logical variable gives (intended) error?
                            
                                R, using Knitr to view a table in HTML
                            
                                Reorganize list into dataframe using dplyr
                            
                                In R print decimal comma instead of decimal point
                            
                                "could not find function" only when in the R debugger
                            
                                R: workaround for variable-width lookbehind
                            
                                line by line debugging in R studio
                            
                                Programmatic subsetting of a data.table in R
                            
                                chordDiagram function, R package circlize
                            
                                R fill in NA with previous row value with condition
                            
                                how to kill parallel program of R in Linux
                            
                                R equivalent of the Matlab spy function
                            
                                Dplyr summarise_each to aggregate results
                            
                                Extracting RColorBrewer palette for other use
                            
                                how do you convert output from readLines to data frame in R
                            
                                R - Compare two data frames of different length for same values in two columns
                            
                                Multi-character plot shapes in ggplot
                            
                                convert list of sparse matrix indices to matrix in R

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Finding common rows in R

Tags:

dataframe

r

unique

sdgaw erzswer

People also ask

1 Answers

BICube

Recent Activity

Donate For Us