I have a dataframe df
:
userID Score Task_Alpha Task_Beta Task_Charlie Task_Delta
3108 -8.00 Easy Easy Easy Easy
3207 3.00 Hard Easy Match Match
3350 5.78 Hard Easy Hard Hard
3961 10.00 Easy NA Hard Hard
4021 10.00 Easy Easy NA Hard
1. userID is factor variable
2. Score is numeric
3. All the 'Task_' features are factor variables with possible values 'Hard', 'Easy', 'Match' or NA
I want to create new columns per userID
that contain the counts of occurrence for each possible state of the Task_
feature. For the above toy example, the required output would be three new columns to be appended at the end of the df
as below:
userID Hard Match Easy
3108 0 0 4
3207 1 2 1
3350 3 0 1
3961 2 0 1
4021 1 0 2
Update: This question is not a duplicate, an associated part of the original question has been moved to: R How to counting the factors in ordered sequence
How do I add a column to a DataFrame in R? To add a new column to a dataframe in R you can use the $-operator. For example, to add the column “NewColumn”, you can do like this: dataf$NewColumn <- Values . Now, this will effectively add your new variable to your dataset.
How to find the sum of a column values up to a value in another column in R? To find the sum of a column values up to a particular value in another column, we can use cumsum function with sum function.
Create New Variables in R with mutate() and case_when() Often you may want to create a new variable in a data frame in R based on some condition. Fortunately this is easy to do using the mutate() and case_when() functions from the dplyr package.
You can reference a column of an R data frame via the column name. If the data was loaded from a CSV file, the column name is the name given to that column in the first line (the header line) of the CSV file.
You can compare the dataframe df
to each value in a map*
or *apply
function, compute the row-wise sums of the resulting boolean matrix, then combine the output with the original dataframe:
library(dplyr)
library(purrr)
facs <- c("Easy", "Match", "Hard")
bind_cols(df, set_names(map_dfc(facs, ~ rowSums(df == ., na.rm = T)), facs))
#### OUTPUT ####
userID Score Task_Alpha Task_Beta Task_Charlie Task_Delta Easy Match Hard
1 3108 -8.00 Easy Easy Easy Easy 4 0 0
2 3207 3.00 Hard Easy Match Match 1 2 1
3 3350 5.78 Hard Easy Hard Hard 1 0 3
4 3961 10.00 Easy <NA> Hard Hard 1 0 2
5 4021 10.00 Easy Easy <NA> Hard 2 0 1
library(data.table)
DT <- fread("userID Score Task_Alpha Task_Beta Task_Charlie Task_Delta
3108 -8.00 Easy Easy Easy Easy
3207 3.00 Hard Easy Match Match
3350 5.78 Hard Easy Hard Hard
3961 10.00 Easy NA Hard Hard
4021 10.00 Easy Easy NA Hard
")
DT.melt <- melt( DT, id.vars = "userID", measure.vars = patterns( task = "^Task_") )
dcast( DT.melt, userID ~ value, fun.aggregate = length )
# userID NA Easy Hard Match
# 1: 3108 0 4 0 0
# 2: 3207 0 1 1 2
# 3: 3350 0 1 3 0
# 4: 3961 1 1 2 0
# 5: 4021 1 2 1 0
Answer to the first part can be obtained by using apply
row-wise and count the occurrence of factor level in each row using table
cbind(df[1], t(apply(df[-c(1, 2)], 1, function(x)
table(factor(x, levels = c("Easy", "Hard", "Match"))))))
# userID Easy Hard Match
#1 3108 4 0 0
#2 3207 1 1 2
#3 3350 1 3 0
#4 3961 1 2 0
#5 4021 2 1 0
In tidyverse
, we can convert the data to long format, drop NA
values, count
occurrence of userID
and value
and get the data back to wide format.
library(dplyr)
library(tidyr)
df %>%
pivot_longer(cols = starts_with("Task"), values_drop_na = TRUE) %>%
count(userID, value) %>%
pivot_wider(names_from = value, values_from = n, values_fill = list(n = 0))
data
df <- structure(list(userID = c(3108L, 3207L, 3350L, 3961L, 4021L),
Score = c(-8, 3, 5.78, 10, 10), Task_Alpha = structure(c(1L,
2L, 2L, 1L, 1L), .Label = c("Easy", "Hard"), class = "factor"),
Task_Beta = structure(c(1L, 1L, 1L, NA, 1L), .Label = "Easy", class = "factor"),
Task_Charlie = structure(c(1L, 3L, 2L, 2L, NA), .Label = c("Easy",
"Hard", "Match"), class = "factor"), Task_Delta = structure(c(1L,
3L, 2L, 2L, 2L), .Label = c("Easy", "Hard", "Match"), class = "factor")),
class = "data.frame", row.names = c(NA, -5L))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With