split strings and add them as new row

Question

I have the following dataset:

df<-data.frame (fact= c("a,b,c,d","f,g,h,v"), value = c("0,1,0,1" , "0,0,1,0"))

This is the data:

   fact   value
1 a,b,c,d 0,1,0,1
2 f,g,h,v 0,0,1,0

I wish to split it when the value is 1. So, my ideal output is:

 fact     value

1:  a,b     0,1
2:  c,d     0,1
3: f,g,h    0,0,1
4:  v       0

Firstly, I thought I might find a way by using cut like:

cut(as.numeric(strsplit(as.character(df$value), split = ",")), breaks =1)

But none of my attempts get close.

eipi10 · Accepted Answer

First we split the strings in fact and value into separate values and stack them so that each becomes a column of values in a data frame. Now, using value, we want each run of zeroes followed by a 1 to become a group. These are the groups of values that we want to paste together at the end. We'll use dplyr to operate separately on each group to return the final data frame.

library(dplyr) 
library(purrr)  # For map function
library(tidyr)  # For separate_rows function

df %>% 
  separate_rows(fact, value, sep=",") %>%
  mutate(group = lag(cumsum(value == 1), default=0)) %>%
  group_by(group) %>%
  summarise(fact = paste(fact, collapse=","),
            value = paste(value, collapse=",")) %>%
  select(-group)     

   fact value 
1   a,b   0,1
2   c,d   0,1
3 f,g,h 0,0,1
4     v     0

aichao · Answer

One way is to split the character vectors for fact and value in the original data frame by "," using strsplit and then determine the position of the first "1" in the split values. Then use this position to determine the split for both fact and value:

sv <- strsplit(df$value,",")
sf <- strsplit(df$fact,",")
pos <- sapply(sv, function(sv) {j <- which(sv=="1"); if (length(j)==0) NA else j[1]})
out <- do.call(rbind,lapply(1:length(pos),function(i,sv,sf,pos) {
  if (is.na(pos[i]) || pos[i] == length(sf[[i]])) 
    data.frame(fact=toString(sf[[i]]),value=toString(sv[[i]])) 
  else 
    data.frame(fact=c(toString(sf[[i]][1:pos[i]]),
                      toString(sf[[i]][(pos[i]+1):length(sf[[i]])])),
               value=c(toString(sv[[i]][1:pos[i]]),
                       toString(sv[[i]][(pos[i]+1):length(sv[[i]])])))
  },sv,sf,pos))
##     fact   value
##1    a, b    0, 1
##2    c, d    0, 1
##3 f, g, h 0, 0, 1
##4       v       0

This answer assumes that there is a "1" in the value to split. If there is not or if the "1" is at the end of value, then that row in df is not split in the output.

split strings and add them as new row

Tags:

r

MFR

2 Answers

eipi10

aichao

Recent Activity

Donate For Us

split strings and add them as new row

Tags:

r

MFR

2 Answers

eipi10

aichao

Related questions

Recent Activity

Donate For Us