I have a dataframe like this one:
Name Characteristic_1 Characteristic_2
Apple Yellow Italian
Pear British Yellow
Strawberries French Red
Blackberry Blue Austrian
As you can see the Characteristic can be in different Columns depending in the row. I would like to obtain a dataframe where each column contains only the values of a specific Characteristic.
Name Characteristic_1 Characteristic_2
Apple Yellow Italian
Pear Yellow British
Strawberries Red French
Blackberry Blue Austrian
My idea is to use the case_when function but I would like to know if there are Faster ways to achieve the same result.
Example data:
df <- structure(list(Name = c("Apple", "Pear", "Strawberries", "Blackberry"
), Characteristic_1 = c("Yellow", "British", "French", "Blue"
), Characteristic_2 = c("Italian", "Yellow", "Red", "Austrian"
)), row.names = c(NA, -4L), class = c("tbl_df", "tbl", "data.frame"
))
I suspect there is an easier way of solving the issue, but here is one potential solution:
# Load the libraries
library(tidyverse)
# Load the data
df <- structure(list(Name = c("Apple", "Pear", "Strawberries", "Blackberry"
), Characteristic_1 = c("Yellow", "British", "French", "Blue"
), Characteristic_2 = c("Italian", "Yellow", "Red", "Austrian"
)), row.names = c(NA, -4L), class = c("tbl_df", "tbl", "data.frame"
))
# R has 657 built in colour names. You can see them using the `colours()` function.
# Chances are your colours are contained in this list.
# The `str_to_title()` function capitalizes every colour in the list
list_of_colours <- str_to_title(colours())
# If your colours are not contained in the list, add them using e.g.
# `list_of_colours <- c(list_of_colours, "Octarine")`
# Create a new dataframe ("df2") by taking the original dataframe ("df")
df2 <- df %>%
# Create two new columns called "Colour" and "Origin" using `mutate()` with
# `ifelse` used to identify whether each word is in the list of colours.
# If the word is in the list of colours, add it to the "Colours" column, if
# it isn't, add it to the "Origin" column.
mutate(Colour = ifelse(!is.na(str_extract(Characteristic_1, paste(list_of_colours, collapse = "|"))),
Characteristic_1, Characteristic_2),
Origin = ifelse(is.na(str_extract(Characteristic_1, paste(list_of_colours, collapse = "|"))),
Characteristic_1, Characteristic_2)) %>%
# Then select the columns you want
select(Name, Colour, Origin)
df2
# A tibble: 4 x 3
# Name Colour Origin
# <chr> <chr> <chr>
#1 Apple Yellow Italian
#2 Pear Yellow British
#3 Strawberries Red French
#4 Blackberry Blue Austrian
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With