I have an interesting (only for me, perhaps, :)) question. I have text like:
"abbba"
The question is to find all possible substrings of length n in this string. For example, if n = 2
, the substrings are
'ab','bb','ba'
and if n = 3
, the substrings are
'abb','bbb','bba'
I thought to use something like this:
x <- 'abbba'
m <- matrix(strsplit(x, '')[[1]], nrow=2)
apply(m, 2, paste, collapse='')
But I got a warning and it doesn't work for len = 3.
Approach: The count of sub-strings of length n will always be len – n + 1 where len is the length of the given string.
Python String count() The count() method searches the substring in the given string and returns how many times the substring is present in it. It also takes optional parameters to start and end to specify the starting and ending positions in the string respectively.
The total number of substrings formed by string of length N is (N*(N+1))/2, initialise count as (N*(N+1))/2.
We may use
x <- "abbba"
allsubstr <- function(x, n) unique(substring(x, 1:(nchar(x) - n + 1), n:nchar(x)))
allsubstr(x, 2)
# [1] "ab" "bb" "ba"
allsubstr(x, 3)
# [1] "abb" "bbb" "bba"
where substring
extracts a substring from x
starting and ending at specified positions. We exploit the fact that substring
is vectorized and pass 1:(nchar(x) - n + 1)
as starting positions and n:nchar(x)
as ending positions.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With