I have the following dataset:
df<-data.frame (fact= c("a,b,c,d","f,g,h,v"), value = c("0,1,0,1" , "0,0,1,0"))
This is the data:
fact value
1 a,b,c,d 0,1,0,1
2 f,g,h,v 0,0,1,0
I wish to split it when the value is 1. So, my ideal output is:
fact value
1: a,b 0,1
2: c,d 0,1
3: f,g,h 0,0,1
4: v 0
Firstly, I thought I might find a way by using cut
like:
cut(as.numeric(strsplit(as.character(df$value), split = ",")), breaks =1)
But none of my attempts get close.
First we split the strings in fact
and value
into separate values and stack them so that each becomes a column of values in a data frame. Now, using value
, we want each run of zeroes followed by a 1 to become a group. These are the groups of values that we want to paste together at the end. We'll use dplyr
to operate separately on each group to return the final data frame.
library(dplyr)
library(purrr) # For map function
library(tidyr) # For separate_rows function
df %>%
separate_rows(fact, value, sep=",") %>%
mutate(group = lag(cumsum(value == 1), default=0)) %>%
group_by(group) %>%
summarise(fact = paste(fact, collapse=","),
value = paste(value, collapse=",")) %>%
select(-group)
fact value
1 a,b 0,1
2 c,d 0,1
3 f,g,h 0,0,1
4 v 0
One way is to split the character vectors for fact
and value
in the original data frame by ","
using strsplit
and then determine the position of the first "1"
in the split value
s. Then use this position to determine the split for both fact
and value
:
sv <- strsplit(df$value,",")
sf <- strsplit(df$fact,",")
pos <- sapply(sv, function(sv) {j <- which(sv=="1"); if (length(j)==0) NA else j[1]})
out <- do.call(rbind,lapply(1:length(pos),function(i,sv,sf,pos) {
if (is.na(pos[i]) || pos[i] == length(sf[[i]]))
data.frame(fact=toString(sf[[i]]),value=toString(sv[[i]]))
else
data.frame(fact=c(toString(sf[[i]][1:pos[i]]),
toString(sf[[i]][(pos[i]+1):length(sf[[i]])])),
value=c(toString(sv[[i]][1:pos[i]]),
toString(sv[[i]][(pos[i]+1):length(sv[[i]])])))
},sv,sf,pos))
## fact value
##1 a, b 0, 1
##2 c, d 0, 1
##3 f, g, h 0, 0, 1
##4 v 0
This answer assumes that there is a "1"
in the value
to split. If there is not or if the "1"
is at the end of value
, then that row in df
is not split in the output.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With