I have a vector
vec <- c("ab", "#4", "gw", "#29", "mp", "jq", "#35", "ez")
which generally follows the pattern of alternating between two different sequences of strings (the first sequence being all alphabetical, the second being numerical with the symbol #).
However there are cases where no # term appears: so in the above between mp and jq, and then again after ez. I would like to define a function which "fills the gaps" with the character string #, so that I would have the output:
 [1] "ab" "#4" "gw" "#29" "mp" "#" "jq" "#35" "ez" "#"
which I would then convert to a data frame
   V1  V2
1  ab  #4
2  gw  #29
3  mp  #
4  jq  #35
5  ez  #
My attempt so far is rather clunky and relies on looping through the vector and filling the gaps. I'd be interested to see more elegant solutions.
My Solution
greplSpace <- function(pattern, replacement, x){
  j <- 1
  while( j < length(x) ){
    if(grepl(pattern, x[j+1]) ){
      j <- j+2 
    } else {
      x <- c( x[1:j], replacement, x[(j+1):length(x)] )
      j <- j+2
    }
  }
  if( ! grepl(pattern, tail(x,1) ) ){ x <- c(x, replacement) }
  return(x)
}
library(magrittr)
vec <- c("ab", "#4", "gw", "#29", "mp", "jq", "#35", "ez")
vec %>% greplSpace("#", "#", . ) %>% 
        matrix(ncol = 2, byrow = TRUE) %>%
        as.data.frame
                Start with your vec, we can create your expected data frame directly with some functions from the dplyr, tidyr, and stringr.
library(dplyr)
library(tidyr)
library(stringr)
vec <- c("ab", "#4", "gw", "#29", "mp", "jq", "#35", "ez")
dat <- data_frame(Value = vec)
dat2 <- dat %>%
  mutate(String = !str_detect(vec, "#"),
         Key = ifelse(String, "V1", "V2"),
         Row = cumsum(String)) %>%
  select(-String) %>%
  spread(Key, Value, fill = "#") %>%
  select(-Row)
dat2
# # A tibble: 5 x 2
#   V1    V2   
#   <chr> <chr>
# 1 ab    #4   
# 2 gw    #29  
# 3 mp    #    
# 4 jq    #35  
# 5 ez    #   
                        Here is a base R option with split.  Create a logical index by checking the "#" in each of the strings, get the cumulative sum and split the original vector by this grouping variable into a list ('lst').  For those list elements that don't have two (maximum length) elements are appended with NA at the end by assignment with length<-.  Then, rbind, the list elements into a two column matrix.  If needed, convert those NA to #
lst <-  split(vec, cumsum(!grepl("#", vec)))
out <- do.call(rbind, lapply(lst, `length<-`, max(lengths(lst))))
out[,2][is.na(out[,2])] <- "#" #not recommended though 
out
#  [,1] [,2] 
#1 "ab" "#4" 
#2 "gw" "#29"
#3 "mp" "#"  
#4 "jq" "#35"
#5 "ez" "#"  
Wrap it with as.data.frame if we need a data.frame output
You can use Base R:
First Collapse the vector into a string while replaceing # where needed.
Then just read using read.csv
vec1=gsub("([a-z]),\\s*([a-z])|$","\\1,#,\\2",toString(vec))
read.csv(text=gsub("(#.*?),","\\1\n",vec1),h=F)
  V1  V2
1 ab  #4
2 gw #29
3 mp   #
4 jq #35
5 ez   #
Explanation:
toString
, ie [a-z],\s*[a-z] or at the end ie |$ you insert an #.# and read in the data as a tableYou can also do:
a=read.csv(h=F,text=toString(sub("([a-z]+)","\n\\1",vec)),na=c(" ",""))[1:2]
a
  V1   V2
1 ab   #4
2 gw  #29
3 mp <NA>
4 jq  #35
5 ez <NA>
 data.frame(replace(as.matrix(a),is.na(a),"#"))
  V1   V2
1 ab   #4
2 gw  #29
3 mp    #
4 jq  #35
5 ez    #
                        Another base possibility:
do.call(rbind, tapply(vec, cumsum(!grepl("^#", vec)), FUN = function(x){
  if(length(x) == 1) c(x, "#") else x}))
#   [,1] [,2] 
# 1 "ab" "#4" 
# 2 "gw" "#29"
# 3 "mp" "#"  
# 4 "jq" "#35"
# 5 "ez" "#"
Explanation:
Check if elements in vec starts with #, and negate it: !grepl("^#", vec); creates a logical vector.
Create a grouping variable by applying cumsum to the logical vector (note: 1 & 2 similar to @akrun).
Use tapply to apply a function to each subset of vec, defined by the grouping variable. Check if the length is 1. If so, pad by a trailing #, else just return the subset: if(length(x) == 1) c(x, "#") else x
Bind the resulting list together by row: do.call(rbind,
Another one:
# create a row index 
ri <- cumsum(!grepl("^#", vec))
# create a column index
ci <- ave(ri, ri, FUN  = seq_along)
# create an empty matrix of desired dimensions
m <- matrix(nrow = max(ri), ncol = 2)
# assign 'vec' to matrix at relevant indices
m[cbind(ri, ci)] <- vec
# replace NA with '#'
m[is.na(m)] <- "#"
Using data.table. Create a grouping variable as above, and reshape from long to wide.
library(data.table)
d <- data.table(vec) 
d[ , g := cumsum(!grepl("^#", vec))]
dcast(d, g ~ rowid(g), value.var = "vec", fill = "#")
#    g  1   2
# 1: 1 ab  #4
# 2: 2 gw #29
# 3: 3 mp   #
# 4: 4 jq #35
# 5: 5 ez   # 
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With