Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Comparing two lists [R]

Tags:

r

match

I have two rather long lists (both are 232000 rows). When trying to run analyses using both, R is giving me an error that some elements in one lists are not in the other (for a particular code to run, both lists need to be exactly the same). I have done the following to try and decipher this:

#In Both
both <- varss %in% varsg
length(both)

#What is in Both
int <- intersect(varss,varsg)
length(int)

#What is different in varss
difs <- setdiff(varss,varsg)
length(difs)

#What is different in varsg
difg <- setdiff(varsg,varss)
length(difg)

I think I have the code right, but my problem is that the results from the code above are not yielding what I need. For instance, for both <- varss %in% varsg I only get a single FALSE. Do both my lists need to be in a specific class in order for this to work? I've tried data.frame, list and character. Not sure whether anything major like a function needs to be applied.

Just to give a little bit more information about my lists, both are a list of SNP names (genetic data)

Edit:

I have loaded these two files as readRDS() and not sure whether this might be causing some problems. When trying to just use varss[1:10,] i get the following info:

 [1] rs41531144 rs41323649 exm2263307 rs41528348 exm2216184 rs3901846 
 [7] exm2216185 exm2216186 exm2216191 exm2216198
232334 Levels: exm1000006 exm1000025 exm1000032 exm1000038 ... rs9990343

I have little experience with RData files, so not sure whether this is a problem or not...

Same happens with using varsg[1:10,] :

 [1] exm2268640 exm41      exm1916089 exm44      exm46      exm47     
 [7] exm51      exm53      exm55      exm56     
232334 Levels: exm1000006 exm1000025 exm1000032 exm1000038 ... rs999943 
like image 817
user2726449 Avatar asked Apr 28 '14 23:04

user2726449


1 Answers

All of the functions you have shown do not play well with lists or data.frames, e.g:

varss <- list(a = 1:8)
varsg <- list(a = 2:9)

both <- varss %in% varsg
both
# [1] FALSE

#What is in Both
int <- intersect(varss,varsg)
int
# list()

#What is different in varss
difs <- setdiff(varss,varsg)
difs
# [[1]]
# [1] 1 2 3 4 5 6 7 8

#What is different in varsg
difg <- setdiff(varsg,varss)
difg
# [[1]]
# [1] 2 3 4 5 6 7 8 9

I suggest you switch to vectors by doing:

varss <- unlist(varss)
varsg <- unlist(varsg)
like image 60
flodel Avatar answered Sep 22 '22 00:09

flodel