I've recently stumbled upon coding task, and I've struggled to get it right. It goes like this:
Given a non-empty string s
and a list word_list
containing a list of non-empty words, determine if s
can be segmented into a space-separated sequence of one or more dictionary words. You may assume the word_list
does not contain duplicates, but each word can be used more than once.
For example, given:
s = 'whataniceday'
word_list = ['a', 'what', 'an', 'nice', 'day']
Return True
, because 'whataniceday'
can be segmented as 'what a nice day'
.
I came up with a pretty naive solution, that works for this particular example, but it is not hard to make it fail, for example by adding a word to word_list
that other word in the list starts with (i.e. ['a', 'wha', 'what', 'an', 'nice', 'day']
). There are plenty of other things that can mess up my solution, but anyway here goes:
s = "whataniceday"
word_list = ["h", "a", "what", "an", "nice", "day"]
def can_be_segmented(s, word_list):
tested_str = s
buildup_str = ''
for letter in tested_str:
buildup_str += letter
if buildup_str not in word_list:
continue
tested_str = tested_str[len(buildup_str):]
buildup_str = ''
return bool(tested_str == '' and buildup_str == '')
print(can_be_segmented(s, word_list))
Do you guys have an idea of how to fix it? Or maybe there is a better approach to this problem?
To convert a string in a list of words, you just need to split it on whitespace. You can use split() from the string class. The default delimiter for this method is whitespace, i.e., when called on a string, it'll split that string at whitespace characters.
Using any() to check if string contains element from list. Using any function is the most classical way in which you can perform this task and also efficiently. This function checks for match in string with match of each element of list.
Python String split() MethodThe split() method splits a string into a list. You can specify the separator, default separator is any whitespace.
Method#1: Using split() method The split method is used to split the strings and store them in the list. The built-in method returns a list of the words in the string, using the “delimiter” as the delimiter string.
>>> import re
>>> s = 'whataniceday'
>>> word_list = ['a', 'what', 'an', 'nice', 'day']
>>> re.match('^(' + '|'.join(f'({s})' for s in word_list) + ')*$', s)
<_sre.SRE_Match object; span=(0, 12), match='whataniceday'>
As a function:
import re
def can_be_segmented(s, word_list):
pattern = re.compile('^(' + '|'.join(f'({s})' for s in word_list) + ')*$')
return pattern.match(s) is not None
It may be an optimization to make the groups non-capturing ((?:word)
rather than (word)
so that re.match
doesn't have to keep track of matched words, but I'm not going to time it.
If your words aren't all just letters you may want to pass them through re.escape()
(as in f'({re.escape(s)})'
instead of f'({s})'
).
If you are going to have mixed-case and you want those to match pass the re.IGNORECASE
or re.I
flag (as in pattern.match(s, re.I)
instead of pattern.match(s)
).
See the re
documentation for more.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With