I'm fairly new to the R language. So I have this vector containing the following:
> head(sampleVector)
[1] "| txt01 | 100 | 200 | 123.456 | 0.12345 |"
[2] "| txt02 | 300 | 400 | 789.012 | 0.06789 |"
I want to extract the lines and break each into separate pieces, with a data value per piece.
I want to get a list resultList
that eventually would print out the following:
> head(resultList)`
[[1]]`
[1] "" "txt01" "100" "200" "123.456" "0.12345"
[[2]]`
[1] "" "txt02" "300" "400" "789.012" "0.06789"
I am struggling with the strsplit()
notation and I have tried and got the following code so far:
resultList <- strsplit(sampleVector,"\\s+[|] | [|]\\s+ | [\\s+]")`
#would give me the following output`
# [[1]]`
# [1] "| txt01" "100" "200" "123.456" "0.12345 |"
Anyway I can get the output the one strsplit
call? I am guessing my notation to distinguish the delimiter + whitespace is wrong. Any help on this would be good.
Another strsplit
option which I nearly missed:
strsplit(test,"[| ]+")
#[[1]]
#[1] "" "txt01" "100" "200" "123.456" "0.12345"
#
#[[2]]
#[1] "" "txt02" "300" "400" "789.012" "0.06789"
...and my original answer because regmatches
is my favourite function of late:
regmatches(test,gregexpr("[^| ]+",test))
#[[1]]
#[1] "txt01" "100" "200" "123.456" "0.12345"
#
#[[2]]
#[1] "txt02" "300" "400" "789.012" "0.06789"
To break it down as requested:
[| ]+
is a regex searching for single or repeated instances +
of a space or a pipe
|
[^| ]+
is a regex searching for single or repeated instances +
of any character not ^
a space or a pipe
|
gregexpr
finds all the instances of this pattern and returns the start locations and length of the matching patterns.regmatches
extracts all the patterns from test
that are matched by gregexpr
Here's one way. This first removes the |
from the vector with gsub
. Then it uses strsplit
on the spaces (or any number of spaces). Probably a bit easier that way.
strsplit(gsub("|", "", sampleVector, fixed=TRUE), "\\s+")
# [[1]]
# [1] "" "txt01" "100" "200" "123.456" "0.12345"
#
# [[2]]
# [1] "" "txt02" "300" "400" "789.012" "0.06789"
Here's an interesting alternative using scan
that might be useful, and will probably be quite fast.
lapply(sampleVector, function(y) {
s <- scan(text = y, what = character(), sep = "|", quiet = TRUE)
(g <- gsub("\\s+", "", s))[-length(g)]
})
# [[1]]
# [1] "" "txt01" "100" "200" "123.456" "0.12345"
#
# [[2]]
# [1] "" "txt02" "300" "400" "789.012" "0.06789"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With