Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert List of Vectors into Data Frame of Counts [duplicate]

I have a list of character vectors stored in a list like this:

basket1 <- c("Apple", "Orange", "Banana", "Apple", "Apple", "Grape")
basket2 <- c("Grape", "Grape", "Grape", "Grape")
basket3 <- c("Kiwi", "Apple", "Cantaloupe", "Banana")
basket4 <- c("Strawberry")
basket5 <- c("Grape", "Grape", "Grape")
FruitBasketList <- list(basket1, basket2, basket3, basket4, basket5)

And I would like to turn the FruitBasketList into a data frame with a count of each fruit in each row matching the basket it came from. The main problem I have is that there could be thousands of different "fruits" in each vector and a lot of them will appear more than once.

This is the desired data frame I would like as a result:

Basket  Apple   Orange  Banana  Grape   Kiwi    Cantaloupe  Strawberry
basket1 3       1       1       1       0       0           0
basket2 0       0       0       4       0       0           0
basket3 1       0       1       0       1       1           0
basket4 0       0       0       0       0       0           1
basket5 0       0       0       3       0       0           0

Obviously, this isn't my real data, but I thought I would simplify what the data looks like so anyone would be able to understand it. No, this isn't homework. Anyhow, The number of fruits in a basket can be a thousand different fruits and the lengths of each fruit vector wouldn't be the same. There can be tens of thousands of baskets (vectors) as well. Obviously, some fruits could be repeated many times in the same vector (basket). I have been working on solving this, but I'm sure it is terribly over-complicated and very inefficient. So far my solution involves combining all the vectors from all the vectors, then identifying all the unique fruit names that are possible. That worked out fine. Then the part I'm struggling with is creating an empty data frame from out of all of these unique column names, then for each vector counting each unique fruit and then placing that value in the correct column in a new row in the data frame along with zeros for fruits that don't exist in that particular basket.

The code I'm using to tally up individual vectors looks like this:

GetUniqueItemCount <- function(rle, value)
{
  value <- rle$lengths[rle$values == value]
  if (identical(value, integer(0)))
  {
    value <- 0
  }
  value
}

And the code to call it looks like this:

Apple <- GetUniqueItemCount(rle, "Apple") 

As you can see in my current code I have to know all the possible fruits before hand and hard code the count of each fruit and then assign that to a specific column known beforehand in the data frame. Anyhow, I realize I am going down the wrong path here, so I would appreciate any advice on getting back on track to getting my desired data frame shown above. Please feel free to offer a completely different approach instead of trying to figure out how to make mine work if that would be the best way to solve the problem.

like image 242
Beaker Avatar asked Feb 02 '15 06:02

Beaker


People also ask

How do I convert a list into a DataFrame in R?

Convert List to DataFrame using data. data. frame() is used to create a DataFrame in R that takes a list, vector, array, etc as arguments, Hence, we can pass a created list to the data. frame() function to convert list to DataFrame. It will store the elements in a single row in the DataFrame.

How do I convert a list to a vector in R?

To convert a list to a vector in R use unlist() function. This function takes a list as one of the arguments and returns a Vector.


2 Answers

I would suggest mtabulate from the "qdapTools" package.

library(qdapTools)
mtabulate(FruitBasketList)
#   Apple Banana Cantaloupe Grape Kiwi Orange Strawberry
# 1     3      1          0     1    0      1          0
# 2     0      0          0     4    0      0          0
# 3     1      1          1     0    1      0          0
# 4     0      0          0     0    0      0          1
# 5     0      0          0     3    0      0          0

The package's author even shares your avatar. Nifty.

like image 159
A5C1D2H2I1M1N2O1R2T1 Avatar answered Jan 05 '23 00:01

A5C1D2H2I1M1N2O1R2T1


Using dplyr, i might do something like

library(dplyr)
m <- FruitBasketList %>% lapply(table) %>% lapply(as.list) %>% 
    lapply(data.frame) %>% rbind_all()
m

# Source: local data frame [5 x 7]
# 
#   Apple Banana Grape Orange Cantaloupe Kiwi Strawberry
# 1     3      1     1      1         NA   NA         NA
# 2    NA     NA     4     NA         NA   NA         NA
# 3     1      1    NA     NA          1    1         NA
# 4    NA     NA    NA     NA         NA   NA          1
# 5    NA     NA     3     NA         NA   NA         NA

which will leave missing values as NA. if you want to set them to 0, you can do

m[is.na(m)]<-0
m

# Source: local data frame [5 x 7]
# 
#   Apple Banana Grape Orange Cantaloupe Kiwi Strawberry
# 1     3      1     1      1          0    0          0
# 2     0      0     4      0          0    0          0
# 3     1      1     0      0          1    1          0
# 4     0      0     0      0          0    0          1
# 5     0      0     3      0          0    0          0
like image 36
MrFlick Avatar answered Jan 04 '23 23:01

MrFlick