I have a column in a data frame with multiple words in each cell separated by ";" (second column).
my_dataframe <- data.frame( first_column = c("x", "y", "x", "x", "y"),
second_column = c("important; very important; not important",
"not important; important; very important",
"very important; important",
"important; not important",
"not important"))
> my_dataframe
first_column second_column
1 x important; very important; not important
2 y not important; important; very important
3 x very important; important
4 x important; not important
5 y not important
I want to keep one word per cell: the most important one.
So I made a list of the words by order of priority:
reference_importance <- list("very important", "important", "not important")
What I would like to get as a second column:
second_column
1 very important
2 very important
3 very important
4 important
5 not important
I tried
for (i in 1:dim(my_dataframe)[1]) {
for (j in 1:length(reference_importance)) {
if (j %in% my_dataframe$second_column){
my_dataframe$second_column[i] <- paste(j)
break}
}
}
Then I thought the problem was that it didn't consider the different words separated by ";" so I tried this:
for (i in 1:dim(my_dataframe)[1]) {
value_as_list <- strsplit(my_dataframe$second_column[i], ";")
print(value_as_list)
for (j in reference_importance) {
if (j %in% value_as_list){
my_dataframe$second_column[i] == j
break}
}
}
But these don't change anything in my column...
(I made this example to simplify, but in reality I have a huge table with many more words and possibilities. That's why I try to do it with a loop and I don't just assign the possible answers manually.)
Using strsplit
and match
basically.
my_dataframe <- transform(my_dataframe, z=strsplit(second_column, '; ') |>
lapply(match, reference_importance) |>
sapply(min) |>
{\(x) unlist(reference_importance)[x]}())
my_dataframe
# first_column second_column z
# 1 x important; very important; not important very important
# 2 y not important; important; very important very important
# 3 x very important; important very important
# 4 x important; not important important
# 5 y not important not important
Note: R >= 4.1 used.
If you need a loop you may do
spl <- strsplit(my_dataframe$second_column, '; ')
my_dataframe$z <- NA_character_
for (i in seq_along(spl)) {
my_dataframe$z[i] <- reference_importance[[min(match(spl[[i]], reference_importance))]]
}
my_dataframe
# first_column second_column z
# 1 x important; very important; not important very important
# 2 y not important; important; very important very important
# 3 x very important; important very important
# 4 x important; not important important
# 5 y not important not important
Of course I used z
for demonstration purposes, actually you would use second_column
instead of z
.
If you want to use a loop, the following worked for me:
my_dataframe <- data.frame( first_column = c("x", "y", "x", "x", "y"),
second_column = c("important; very important; not important",
"not important; important; very important",
"very important; important",
"important; not important",
"not important"))
reference_importance <- list("very important", "important", "not important")
# add new column for priority word
my_dataframe <- my_dataframe %>%
mutate(Priority_importance = NA)
# use a loop to identify highest priority substring
for (i in 1:nrow(my_dataframe)) {
value_as_list <- strsplit(my_dataframe$second_column[i], ";")
for (j in 1:length(reference_importance)) {
if (value_as_list == as.character((reference_importance[j]))) {
my_dataframe$Priority_importance[i] <- reference_importance[j] # paste importance level
break # move to next iteration
}
}
}
my_dataframe
first_column second_column Priority_importance
1 x important; very important; not important very important
2 y not important; important; very important very important
3 x very important; important very important
4 x important; not important important
5 y not important not important
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With