I hope I can explain it so it's easy for you. I would need this as missing information in a string is marked as three spaces, and surprisingly doesn't perform an \n
for the next piece of information.
Imagine I have a string like:
string <- "abc
def
ghi jkl"
I want the output of a regex expression (maybe with strsplit()
with a more advanced function) to be:
[[1]]
[1] "abc" "" "def" "ghi" "" "jkl"
That splits when it finds a \n
and that it splits and inserts a white space when it finds three spaces. I need to mark that missing info as another value. If not, that breaks my script, thinking that the next info is, for example, three spaces concatenated with the def
string.
Thank you
Here are two solutions which both use strsplit
but differ in how they split:
1) split on newline Remove all newlines giving s1
and then add a newline after every third character giving s2
. Split s2 on newlines and replace each occurrence of three consecutive spaces with the empty string.
Split <- function(string) {
s1 <- gsub("\n", "", string)
s2 <- gsub("(.{3})", "\\1\n", s1)
spl <- strsplit(s2, "\n")
lapply(spl, function(s) replace(s, s == " ", ""))
}
# test
string <- "abc\n def\nghi jkl"
Split(string)
## [[1]]
## [1] "abc" "" "def" "ghi" "" "jkl"
2) split on zero width 3 char regexp Remove the newlines and split using the indicated regular expression. Finally replace each consecutive three spaces with the empty string.
Split2 <- function(string) {
s1 <- gsub("\n", "", string)
spl <- strsplit(s1, "(?<=...)", perl = TRUE)
lapply(spl, function(s) replace(s, s == " ", ""))
}
# test
string <- "abc\n def\nghi jkl"
Split2(string)
## [[1]]
## [1] "abc" "" "def" "ghi" "" "jkl"
Note: 1. Note that the other answers provided to this question do not work for the following input string (which has two empty fields in succession) but the answers here do correctly recognize two empty 3 character fields in succession after the abc
field:
string2 <- "abc\n def\nghi jkl" # 6 spaces before d, 3 spaces before j
Split(string2)
## [[1]]
## [1] "abc" "" "" "def" "ghi" "" "jkl"
Split2(string2)
## [[1]]
## [1] "abc" "" "" "def" "ghi" "" "jkl"
Note 2: The two solutions above can also be nicely expressed using a magrittr pipeline:
library(magrittr)
string %>%
gsub(pattern = "\n", replacement = "") %>%
gsub(pattern = "(.{3})", replacement = "\\1\n") %>%
strsplit("\n") %>%
lapply(function(s) replace(s, s == " ", ""))
## [[1]]
## [1] "abc" "" "def" "ghi" "" "jkl"
library(magrittr)
string %>%
gsub(pattern = "\n", replacement = "") %>%
strsplit("(?<=...)", perl = TRUE) %>%
lapply(function(s) replace(s, s == " ", ""))
## [[1]]
## [1] "abc" "" "def" "ghi" "" "jkl"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With