Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

split strings and add them as new row

Tags:

r

I have the following dataset:

df<-data.frame (fact= c("a,b,c,d","f,g,h,v"), value = c("0,1,0,1" , "0,0,1,0"))

This is the data:

   fact   value
1 a,b,c,d 0,1,0,1
2 f,g,h,v 0,0,1,0

I wish to split it when the value is 1. So, my ideal output is:

 fact     value

1:  a,b     0,1
2:  c,d     0,1
3: f,g,h    0,0,1
4:  v       0

Firstly, I thought I might find a way by using cut like:

cut(as.numeric(strsplit(as.character(df$value), split = ",")), breaks =1)

But none of my attempts get close.

like image 389
MFR Avatar asked Dec 06 '16 23:12

MFR


2 Answers

First we split the strings in fact and value into separate values and stack them so that each becomes a column of values in a data frame. Now, using value, we want each run of zeroes followed by a 1 to become a group. These are the groups of values that we want to paste together at the end. We'll use dplyr to operate separately on each group to return the final data frame.

library(dplyr) 
library(purrr)  # For map function
library(tidyr)  # For separate_rows function

df %>% 
  separate_rows(fact, value, sep=",") %>%
  mutate(group = lag(cumsum(value == 1), default=0)) %>%
  group_by(group) %>%
  summarise(fact = paste(fact, collapse=","),
            value = paste(value, collapse=",")) %>%
  select(-group)     

   fact value 
1   a,b   0,1
2   c,d   0,1
3 f,g,h 0,0,1
4     v     0
like image 55
eipi10 Avatar answered Sep 28 '22 20:09

eipi10


One way is to split the character vectors for fact and value in the original data frame by "," using strsplit and then determine the position of the first "1" in the split values. Then use this position to determine the split for both fact and value:

sv <- strsplit(df$value,",")
sf <- strsplit(df$fact,",")
pos <- sapply(sv, function(sv) {j <- which(sv=="1"); if (length(j)==0) NA else j[1]})
out <- do.call(rbind,lapply(1:length(pos),function(i,sv,sf,pos) {
  if (is.na(pos[i]) || pos[i] == length(sf[[i]])) 
    data.frame(fact=toString(sf[[i]]),value=toString(sv[[i]])) 
  else 
    data.frame(fact=c(toString(sf[[i]][1:pos[i]]),
                      toString(sf[[i]][(pos[i]+1):length(sf[[i]])])),
               value=c(toString(sv[[i]][1:pos[i]]),
                       toString(sv[[i]][(pos[i]+1):length(sv[[i]])])))
  },sv,sf,pos))
##     fact   value
##1    a, b    0, 1
##2    c, d    0, 1
##3 f, g, h 0, 0, 1
##4       v       0

This answer assumes that there is a "1" in the value to split. If there is not or if the "1" is at the end of value, then that row in df is not split in the output.

like image 39
aichao Avatar answered Sep 28 '22 18:09

aichao