Let me define a data frame with one column id
formed by a vector of integer
df <- data.frame(id = c(1,2,2,3,3))
and a column objects
which instead is list of character vectors. Let''s create the column with the following function
randomObjects <- function(argument) {
numberObjects <- sample(c(1,2,3,4), 1)
vector <- character()
for (i in 1:numberObjects) {
vector <- c(vector, sample(c("apple","pear","banana"), 1))
}
return(vector)
}
which is then called with lapply
set.seed(28100)
df$objects <- lapply(df$id, randomObjects)
The resulting data frame is
df
# id objects
# 1 1 apple, apple
# 2 2 apple, banana, pear
# 3 2 banana
# 4 3 banana, pear, banana
# 5 3 pear, pear, apple, pear
Now I want to count the number of objects corresponding to each id
with a data frame like this
summary <- data.frame(id = c(1, 2, 3),
apples = c(2, 1, 1),
bananas = c(0, 2, 2),
pears = c(0, 1, 4))
summary
# id apples bananas pears
# 1 1 2 0 0
# 2 2 1 2 1
# 3 3 1 2 4
How should I collapse the information of df
into a more compact data frame such as summary
without using a for
loop?
To count occurrences between columns, simply use both names, and it provides the frequency between the values of each column. This process produces a dataset of all those comparisons that can be used for further processing.
Use the length() function to count the number of elements returned by the which() function, as which function returns the elements that are repeated more than once. The length() function in R Language is used to get or set the length of a vector (list) or other objects.
R provides us nrow() function to get the rows for an object. That is, with nrow() function, we can easily detect and extract the number of rows present in an object that can be matrix, data frame or even a dataset.
Here is a "data.table" approach:
library(data.table)
dcast.data.table(as.data.table(df)[
, unlist(objects), by = id][
, .N, by = .(id, V1)],
id ~ V1, value.var = "N", fill = 0L)
# id apple banana pear
# 1: 1 2 0 0
# 2: 2 1 2 1
# 3: 3 1 2 4
unlist
the values by ID, count them using .N
, and reshape wide with dcast.data.table
.
Initially, I had thought of mtabulate
from "qdapTools", but that doesn't do the aggregation step. Still, you can try something like:
library(data.table)
library(qdapTools)
data.table(cbind(df[1], mtabulate(df[[-1]])))[, lapply(.SD, sum), by = id]
# id apple banana pear
# 1: 1 2 0 0
# 2: 2 1 2 1
# 3: 3 1 2 4
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With