Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R: How many element of list 1 are in list 2/Number of occurences

Tags:

list

r

compare

I would like to compare two lists (two rows of a data frame) and count how many differences I have between the two lists.

for instance:

list1=a,b,c,a
list2=a,a,d,d

two elements of list 1 are in list 2

I am able to do that with a loop and sum but it is very inefficient. Is there any function to do that in R?

I have checked setdiff and the compare package but did not find anything that helps.

Thanks for your ideas,

Vincent

My function looks like:

        NRebalancing=function(NamePresent)
        {
          Nbexchange=NamePresent[,2]
          Nbexchange=NamePresent[1,2]=0

          for (i in 2:nrow(NamePresent))
          {
            print(i)
            compteur=0
            NameNeeded=NamePresent[i,]
            NameNeeded=unique(NameNeeded)
            NameNeeded=na.omit(NameNeeded)
            for(j in 2:length(NameNeeded))
              #j=1 correspond a une date
            {
              compteur = compteur+(abs(sum(NamePresent[i,]==as.character(NameNeeded[j]))-sum(NamePresent[i-1,]==as.character(NameNeeded[j]))))
            }
          Nbexchange[i]=compteur  
          }

          return(Nbexchange)
        }
like image 874
VincentH Avatar asked Dec 26 '22 21:12

VincentH


1 Answers

One main point: your list isn't an R list - that's something a bit special. You are using vectors:

R> is.vector(l1)
[1] TRUE
R> is.list(l1)
[1] FALSE

don't call variables list1 if they are vectors.


Since you have a vector there are lots of possibilities open.

  1. The %in% operator

    R> l1 = c("a", "b", "c", "d")
    R> l2 = c("a", "a", "d", "d")
    R> l1[l1 %in% l2]
     [1] "a" "d"
    
  2. Or use is.element

    R> l1[is.element(l1, l2)]
     [1] "a" "d"
    
  3. There is also unique:

    R> unique(l2)
     [1] "a" "d"
    

    Following your comment to @mrdwab, you can count the number of occurances using a combination of sapply and unique

    sapply(unique(l1), function(i) sum(i==l2))
    

    i==l2 checks for membership, sum counts the number of times TRUE appears and sapply is basically just a for loop over unique(l1)

    R> sapply(unique(l1), function(i) sum(i==l2))
    a b c d 
    2 0 0 2
    
  4. A very nice suggestion from @mrdwab is to use table and colSums:

    R> table(l1, l2)
      l2 l1  
       a d
     a 1 0
     b 1 0
     c 0 1
     d 0 1
    R> colSums(table(l1, l2))
     a d 
     2 2 
    
like image 147
csgillespie Avatar answered Jan 14 '23 04:01

csgillespie