Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find all possible substrings of length n

Tags:

r

I have an interesting (only for me, perhaps, :)) question. I have text like:

"abbba"

The question is to find all possible substrings of length n in this string. For example, if n = 2, the substrings are

'ab','bb','ba'

and if n = 3, the substrings are

'abb','bbb','bba'

I thought to use something like this:

x <- 'abbba'
m <- matrix(strsplit(x, '')[[1]], nrow=2)
apply(m, 2, paste, collapse='')

But I got a warning and it doesn't work for len = 3.

like image 836
Lionir Avatar asked Feb 22 '16 18:02

Lionir


People also ask

How many substrings does length N have?

Approach: The count of sub-strings of length n will always be len – n + 1 where len is the length of the given string.

How do you calculate substrings?

Python String count() The count() method searches the substring in the given string and returns how many times the substring is present in it. It also takes optional parameters to start and end to specify the starting and ending positions in the string respectively.

How many substrings can be formed?

The total number of substrings formed by string of length N is (N*(N+1))/2, initialise count as (N*(N+1))/2.


1 Answers

We may use

x <- "abbba"
allsubstr <- function(x, n) unique(substring(x, 1:(nchar(x) - n + 1), n:nchar(x)))
allsubstr(x, 2)
# [1] "ab" "bb" "ba"
allsubstr(x, 3)
# [1] "abb" "bbb" "bba"

where substring extracts a substring from x starting and ending at specified positions. We exploit the fact that substring is vectorized and pass 1:(nchar(x) - n + 1) as starting positions and n:nchar(x) as ending positions.

like image 138
Julius Vainora Avatar answered Nov 15 '22 05:11

Julius Vainora