While trying to get my data fit for analysis, I can't seem to do this correctly. Presume I have a datasets in this form:
df1
V1 V2df1
a H
b Y
c Y
df2
V1 V2df2
a Y
j H
b Y
and three more (5 datasets of different lengths alltogether). What I am trying to do is the following. First I must find all common elements from the first column(V1) - in this case those are: a,b. Then according to those common elements, I'm trying to build a joined dataset, where values of V1 would be common to all five datasets and values from other columns would be appended in the same row. So to explain with an example, my result should look something like:
V1 V2df1 V2df2
a H Y
b Y Y
I managed to get some code working, but apperently the results are not correct. What I did:
read all the lines from all files into variables(example: a<-df1[,1]
and so on) and find common rows like:
red<-Reduce(intersect, list(a,b,c,d,e))
then I filtered specific datasets like:
df1 <- unique(filter(df1, V1 %in% red))
I ordered every dataset according to row:
df1<-data.frame(df1[with(df1, order(V1)),])
and deleted duplicates(of elements in first column):
df1<- df1[unique(df1$V1),]
I then created a new dataset with:
newdata<-data.frame(V1common=df1[,1], V2df1=df1[,2],V2df2=df2[,2]...)
... means for all five of datasets. I actually got the same number of rows(a good sign since there are the same number of rows within intersection), and then appended other sorted columns, but something doesn't add up. Thanks for any advice. (I omitted the use of libraries and such, the code is for illustrative purposes).
To find the common rows between two DataFrames with merge(), use the parameter “how” as “inner” since it works like SQL Inner Join and this is what we want to achieve.
To get a specific row of a matrix, specify the row number followed by a comma, in square brackets, after the matrix variable name. This expression returns the required row as a vector.
Instead of finding the common rows, sometimes we need to find the uncommon rows between two data frames. It is mostly used when we expect that a large number of rows are uncommon instead of few ones. We can do this by using the negation operator which is represented by exclamation sign with subset function.
You can use join_all
from plyr
package
require(plyr)
df <- join_all(list(df1,df2,df3,df4, df5), by = 'V1', type = 'inner')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With