Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unlist column to create unique row in dataframe

Tags:

dataframe

r

I am faced with the following R transformation issue. I have the following dataframe:

 test_df <-  structure(list(word = c("list of XYZ schools", 
"list of basketball", "list of usa"), results = c("58", "151", "29"), key_list = structure(list(`coRq,coG,coQ,co7E,coV98` = c("coRq", "coG", "coQ", "co7E", "coV98"), `coV98,coUD,coHF,cobK,con7` = c("coV98","coUD", "coHF", "cobK", "con7"), `coV98,coX7,couC,coD3,copW` = c("coV98", "coX7", "couC", "coD3", "copW")), .Names = c("coRq,coG,coQ,co7E,coV98", "coV98,coUD,coHF,cobK,con7", "coV98,coX7,couC,coD3,copW"))), .Names = c("word", "results", "key_list"), row.names = c(116L, 150L, 277L), class = "data.frame")

In short there are three columns, unique on "word" and then a corresponding "key_list" that has a list of keys comma separated. I am interested in creating a new data frame where each key is unique and the word information is duplicated as well as the result information. So a dataframe that looks as follows:

key          word                    results                    
coV98       "list of XYZ schools"    58
coRq        "list of XYZ schools"    58
coV98       "list of basketball"     151
coV98       "list of usa"            29

And so on for all the keys, so I would like to expand the keys unlist them and then reshape into a dataframe with repeating words and other columns.

I have tried a bunch of the following: Created a unique list of keys and then attempted to grep for each of those keys in the column and loop through to create a new smaller dataframe and then rbind those together, the resulting dataframe however does not contain the key column:

keys <- as.data.frame(table(unname(unlist(test_df$key_list))))
ttt <- lapply(keys, function(xx){
      idx <- grep(xx, test_df$key_list)
      df <- all_data_sub[idx,]})
      final_df <- do.call(rbind, ttt)

I have also played around with unlisting and reshaping, but I am not getting the right combination. Any advice would be great! thanks

like image 337
RCN Avatar asked Mar 14 '23 13:03

RCN


2 Answers

May be we can use listCol_l from splitstackshape

library(splitstackshape)
listCol_l(test_df, 'key_list')[]
like image 189
akrun Avatar answered Mar 24 '23 00:03

akrun


In case a base R solution is helpful for someone:

do.call(rbind, lapply(seq_along(test_df$key_list), function(i) {
    merge(test_df$key_list[[i]], test_df[i,-3], by=NULL)
  }))
like image 23
Zelazny7 Avatar answered Mar 23 '23 23:03

Zelazny7