Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to find and replace values in a df according to a list of priority words (with for loop and condition)?

I have a column in a data frame with multiple words in each cell separated by ";" (second column).

my_dataframe <- data.frame( first_column = c("x", "y", "x", "x", "y"),
                            second_column = c("important; very important; not important",
                                              "not important; important; very important",
                                              "very important; important",
                                              "important; not important",
                                              "not important"))
> my_dataframe
  first_column                            second_column
1            x important; very important; not important
2            y not important; important; very important
3            x                very important; important
4            x                 important; not important
5            y                            not important

I want to keep one word per cell: the most important one.

So I made a list of the words by order of priority:

reference_importance <- list("very important", "important", "not important")

What I would like to get as a second column:

 second_column
1 very important
2 very important
3 very important
4 important
5 not important

I tried

for (i in 1:dim(my_dataframe)[1]) {
  for (j in 1:length(reference_importance)) {
    if (j %in% my_dataframe$second_column){
      my_dataframe$second_column[i] <- paste(j)
      break}
  }
}

Then I thought the problem was that it didn't consider the different words separated by ";" so I tried this:

for (i in 1:dim(my_dataframe)[1]) {
  value_as_list <- strsplit(my_dataframe$second_column[i], ";")
  print(value_as_list)
  for (j in reference_importance) {
    if (j %in% value_as_list){
      my_dataframe$second_column[i] == j
      break}
  }
} 

But these don't change anything in my column...

(I made this example to simplify, but in reality I have a huge table with many more words and possibilities. That's why I try to do it with a loop and I don't just assign the possible answers manually.)

like image 321
Inès Moreno Avatar asked Sep 12 '25 08:09

Inès Moreno


2 Answers

Using strsplit and match basically.

my_dataframe <- transform(my_dataframe, z=strsplit(second_column, '; ') |>
                            lapply(match, reference_importance) |>
                            sapply(min) |>
                            {\(x) unlist(reference_importance)[x]}())
my_dataframe
#   first_column                            second_column              z
# 1            x important; very important; not important very important
# 2            y not important; important; very important very important
# 3            x                very important; important very important
# 4            x                 important; not important      important
# 5            y                            not important  not important

Note: R >= 4.1 used.

If you need a loop you may do

spl <- strsplit(my_dataframe$second_column, '; ')
my_dataframe$z <- NA_character_

for (i in seq_along(spl)) {
  my_dataframe$z[i] <- reference_importance[[min(match(spl[[i]], reference_importance))]]
}
my_dataframe
#   first_column                            second_column              z
# 1            x important; very important; not important very important
# 2            y not important; important; very important very important
# 3            x                very important; important very important
# 4            x                 important; not important      important
# 5            y                            not important  not important

Of course I used z for demonstration purposes, actually you would use second_column instead of z.

like image 115
jay.sf Avatar answered Sep 15 '25 01:09

jay.sf


If you want to use a loop, the following worked for me:

my_dataframe <- data.frame( first_column = c("x", "y", "x", "x", "y"),
                            second_column = c("important; very important; not important",
                                              "not important; important; very important",
                                              "very important; important",
                                              "important; not important",
                                              "not important"))

reference_importance <- list("very important", "important", "not important")


# add new column for priority word 
my_dataframe <- my_dataframe %>%
  mutate(Priority_importance = NA)

# use a loop to identify highest priority substring
for (i in 1:nrow(my_dataframe)) {
  value_as_list <- strsplit(my_dataframe$second_column[i], ";")
  
  for (j in  1:length(reference_importance)) {
    if (value_as_list == as.character((reference_importance[j]))) { 
      my_dataframe$Priority_importance[i] <- reference_importance[j] # paste importance level 
      break # move to next iteration 
    }
  }
}

my_dataframe

  first_column                            second_column Priority_importance
1            x important; very important; not important      very important
2            y not important; important; very important      very important
3            x                very important; important      very important
4            x                 important; not important           important
5            y                            not important       not important
like image 43
AWestell Avatar answered Sep 15 '25 01:09

AWestell