I'm new to R and am struggling to to figure this out. I have a data fame with a column of character vectors that contain comma separated lists of things. I want to keep that column but add a column for each item with a value of 0 (not in the list) or 1 (in the list).
Here's what's I'm trying:
library("tidyverse")
colors <- c("red;blue", "red;green")
df <- data.frame(colors, stringsAsFactors = FALSE)
df %>%
mutate(green = case_when("green" %in% strsplit(colors,";")[[1]] ~ 1,
TRUE ~ 0))
The result I get is:
colors green
1 red;blue 0
2 red;green 0
I expected the value for "green" in the second row to be 1.
To try to debug this I tried this:
> strsplit("red;green", ";")
[[1]]
[1] "red" "green"
> "green" %in% strsplit("red;green",";")[[1]]
[1] TRUE
# and the negative case
> "green" %in% strsplit("red;blue",";")[[1]]
[1] FALSE
What am I missing?
With a data.table
solution, you can use tstrsplit
:
library(data.table)
df <- data.table::data.table(
color = c("red;blue", "red;green")
)
df[, c("col1","col2") := tstrsplit(color, ";", fixed = TRUE)]
df[, "green" := (col2 == "green")]
df
# color col1 col2 green
# 1: red;blue red blue FALSE
# 2: red;green red green TRUE
If you are not familiar with data.table
update-by-reference operator :=
, data.table
vignettes are a good place to start. The option fixed = TRUE
in tstrsplit
assumes that you always have the same number of elements in your comma separated list.
There is a solution that, I think, is more adapted to a situation where you have more than a few values. Using repetitively lapply
, you can add a series of columns to your data.table
Starting back with df
:
df <- data.table::data.table(
color = c("red;blue", "red;green")
)
Calling lapply
with grepl
to scan for the relevent color, we update by reference our object (note that you could use more than three colors):
lapply(c("red","green","blue"), function(x){
df[grepl(x, color), c(as.character(x)) := TRUE]
})
#[[1]]
#[[2]]
# color red green blue
#1: red;blue TRUE NA TRUE
#2: red;green TRUE TRUE NA
#[[3]]
# color red green blue
#1: red;blue TRUE NA TRUE
#2: red;green TRUE TRUE NA
There is no need to re-assign the dataframe. It has been updated by reference. Only the last slot of df
interests us. Finally, by selecting this one and setting NAs
to FALSE
:
df <- df[[length(df)]]
df[is.na(df)] <- FALSE
df
# color red green blue
# 1: red;blue TRUE FALSE TRUE
# 2: red;green TRUE TRUE FALSE
Hope it helps
We can use str_detect
library(dplyr)
library(stringr)
df %>%
mutate(green = +(str_detect(colors, 'green')))
If we wanted new columns
library(qdapTools)
cbind(df, mtabulate(strsplit(df$colors, ";")))
# colors blue green red
#1 red;blue 1 0 1
#2 red;green 0 1 1
Or using base R
cbind(df, as.data.frame.matrix(table(stack(setNames(strsplit(df$colors, ";"),
seq_along(df$colors)))[2:1])))
In the OP's, code, the strsplit
list
first element ([[1]]
) is selected instead of looping over the list, resulting in recycling of the element and getting FALSE as there is no 'green' in the first list
element
library(purrr)
df %>%
mutate(green = map_int(strsplit(colors, ";"),
~ case_when('green' %in% .x ~ 1L, TRUE ~ 0L)))
# colors green
#1 red;blue 0
#2 red;green 1
Data
colors <- c("red;blue", "red;green")
df <- data.frame(colors, stringsAsFactors = FALSE)
Code
cbind.data.frame(colors,
sapply( unique(unlist(strsplit( unlist(df), ";", fixed = TRUE))),
function(x) as.integer(grepl(x, colors))))
Output
# colors red blue green
# 1 red;blue 1 1 0
# 2 red;green 1 0 1
Using %in%
and no regular expression on a different dataset with similar items: green and greenish
colors <- c("red;blue;greenish", "red;green")
df <- data.frame(colors, stringsAsFactors = FALSE)
myfun <- function(x) { unique(unlist(strsplit( unlist(x), ";", fixed = TRUE))) }
df2 <- t(sapply( df$colors, function(x) { as.integer(myfun(df) %in% myfun(x))}))
colnames(df2) <- myfun(df)
df2
# red blue greenish green
# red;blue;greenish 1 1 1 0
# red;green 1 0 0 1
%in%
does not work that way. Try grepl
df %>% mutate(green = case_when(grepl("green",colors) ~ 1,TRUE ~ 0)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With