Let me define a data frame with one column <code>id</code> formed by a vector of integer <pre class="prettyprint"><code>df <- data.frame(id = c(1,2,2,3,3)) </code></pre> and a column <code>objects</code> which instead is list of character vectors. Let''s create the column with the following function <pre class="prettyprint"><code>randomObjects <- function(argument) { numberObjects <- sample(c(1,2,3,4), 1) vector <- character() for (i in 1:numberObjects) { vector <- c(vector, sample(c("apple","pear","banana"), 1)) } return(vector) } </code></pre> which is then called with <code>lapply</code> <pre class="prettyprint"><code>set.seed(28100) df$objects <- lapply(df$id, randomObjects) </code></pre> The resulting data frame is <pre class="prettyprint"><code>df # id objects # 1 1 apple, apple # 2 2 apple, banana, pear # 3 2 banana # 4 3 banana, pear, banana # 5 3 pear, pear, apple, pear </code></pre> Now I want to count the number of objects corresponding to each <code>id</code> with a data frame like this <pre class="prettyprint"><code>summary <- data.frame(id = c(1, 2, 3), apples = c(2, 1, 1), bananas = c(0, 2, 2), pears = c(0, 1, 4)) summary # id apples bananas pears # 1 1 2 0 0 # 2 2 1 2 1 # 3 3 1 2 4 </code></pre> How should I collapse the information of <code>df</code> into a more compact data frame such as <code>summary</code> without using a <code>for</code> loop?

Here is a "data.table" approach: <pre class="prettyprint"><code>library(data.table) dcast.data.table(as.data.table(df)[ , unlist(objects), by = id][ , .N, by = .(id, V1)], id ~ V1, value.var = "N", fill = 0L) # id apple banana pear # 1: 1 2 0 0 # 2: 2 1 2 1 # 3: 3 1 2 4 </code></pre> <code>unlist</code> the values by ID, count them using <code>.N</code>, and reshape wide with <code>dcast.data.table</code>. <hr> Initially, I had thought of <code>mtabulate</code> from "qdapTools", but that doesn't do the aggregation step. Still, you can try something like: <pre class="prettyprint"><code>library(data.table) library(qdapTools) data.table(cbind(df[1], mtabulate(df[[-1]])))[, lapply(.SD, sum), by = id] # id apple banana pear # 1: 1 2 0 0 # 2: 2 1 2 1 # 3: 3 1 2 4 </code></pre>

R: Count objects in column-list

Tags:

list

dataframe

r

Let me define a data frame with one column id formed by a vector of integer

df <- data.frame(id = c(1,2,2,3,3))

and a column objects which instead is list of character vectors. Let''s create the column with the following function

randomObjects <- function(argument) {
  numberObjects <- sample(c(1,2,3,4), 1)
  vector <- character()
  for (i in 1:numberObjects) {
    vector <- c(vector, sample(c("apple","pear","banana"), 1))
  }
  return(vector)
}

which is then called with lapply

set.seed(28100)
df$objects <- lapply(df$id, randomObjects)

The resulting data frame is

df
#   id                 objects
# 1  1            apple, apple
# 2  2     apple, banana, pear
# 3  2                  banana
# 4  3    banana, pear, banana
# 5  3 pear, pear, apple, pear

Now I want to count the number of objects corresponding to each id with a data frame like this

summary <- data.frame(id = c(1, 2, 3),
                      apples = c(2, 1, 1), 
                      bananas = c(0, 2, 2),
                      pears = c(0, 1, 4))

summary
#   id apples bananas pears
# 1  1      2       0     0
# 2  2      1       2     1
# 3  3      1       2     4

How should I collapse the information of df into a more compact data frame such as summary without using a for loop?

836

asked Apr 17 '15 14:04

CptNemo

1 Answers

Here is a "data.table" approach:

library(data.table)
dcast.data.table(as.data.table(df)[
  , unlist(objects), by = id][
    , .N, by = .(id, V1)], 
  id ~ V1, value.var = "N", fill = 0L)
#    id apple banana pear
# 1:  1     2      0    0
# 2:  2     1      2    1
# 3:  3     1      2    4

unlist the values by ID, count them using .N, and reshape wide with dcast.data.table.

Initially, I had thought of mtabulate from "qdapTools", but that doesn't do the aggregation step. Still, you can try something like:

library(data.table)
library(qdapTools)
data.table(cbind(df[1], mtabulate(df[[-1]])))[, lapply(.SD, sum), by = id]
#    id apple banana pear
# 1:  1     2      0    0
# 2:  2     1      2    1
# 3:  3     1      2    4

165

answered Sep 22 '22 11:09

A5C1D2H2I1M1N2O1R2T1

Related questions
                            
                                Changing class of data frame columns using strings
                            
                                How to convert tiff image to jpeg using r [closed]
                            
                                Compare multiple vectors at the same time in R?
                            
                                How to combine multiple JSON files into a single file in R
                            
                                Function that returns a function in R
                            
                                Extracting Dates from xts object based on vaule
                            
                                how to pause R and resume it later?
                            
                                Loading intraday data into R for handling it with quantmod
                            
                                How to make custom plot symbols from vector graphics in R
                            
                                Using function result again in j expression
                            
                                How to avoid `all` function returning `TRUE` when comparing to `NULL` or an empty object
                            
                                Splitting the values in column using regex
                            
                                Colors lost in legend when using scale_shape_manual
                            
                                Is there a way in data.table to assign ID's by group based upon an identifier? [duplicate]
                            
                                convert a csv to excel without using xlsx package
                            
                                Merging contents of columns using apply or other vectorized approach
                            
                                Remove the last part of a string after the last "." in R
                            
                                File compression options with ggplot2
                            
                                Calculate quantiles in R without interpolation - round up or down to actual value
                            
                                How to change line properties in ggplot2 halfway in a time series?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With