The Titanic Dataset can be downloaded from kaggle: kaggle.com/c/titanic/data. Please use the train.csv or install the package 'titanic' and use the dataset titanic_train.
This works
library(dplyr)
library(stringr)
titanic <- titanic %>%
mutate(Cabin_Letter = ifelse(!is.na(Cabin), str_extract(Cabin, "[A-Z]+"), 'Unknown'))
This does not work entirely
titanic <- titanic %>%
mutate(Cabin_Letter = factor(ifelse(!is.na(Cabin), str_extract(Cabin, "[A-Z]+"), 'Unknown')))
Warning:
Warning messages: 1: In mutate_impl(.data, dots) : Unequal factor levels: coercing to character 2: In mutate_impl(.data, dots) : binding character and factor vector, coercing into character vector 3: In mutate_impl(.data, dots) : binding character and factor vector, coercing into character vector 4: In mutate_impl(.data, dots) : binding character and factor vector, coercing into character vector 5: In mutate_impl(.data, dots) : binding character and factor vector, coercing into character vector 6: In mutate_impl(.data, dots) : binding character and factor vector, coercing into character vector 7: In mutate_impl(.data, dots) : binding character and factor vector, coercing into character vector
How could I resolve this issue? I don't want to take the extra line:
titanic$Cabin_letter <- factor(titanic$Cabin_letter)
This issue can happen if the data is grouped (grouped_df
) using the group_by()
function. I just ran into it. My solution was to ungroup()
the data frame and then convert to factor using as.factor()
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With