Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Check if string can be splitted into sentence using words in provided list

I've recently stumbled upon coding task, and I've struggled to get it right. It goes like this:


Given a non-empty string s and a list word_list containing a list of non-empty words, determine if s can be segmented into a space-separated sequence of one or more dictionary words. You may assume the word_list does not contain duplicates, but each word can be used more than once.

For example, given:

s = 'whataniceday'
word_list = ['a', 'what', 'an', 'nice', 'day']

Return True, because 'whataniceday' can be segmented as 'what a nice day'.


I came up with a pretty naive solution, that works for this particular example, but it is not hard to make it fail, for example by adding a word to word_list that other word in the list starts with (i.e. ['a', 'wha', 'what', 'an', 'nice', 'day']). There are plenty of other things that can mess up my solution, but anyway here goes:

s = "whataniceday"
word_list = ["h", "a", "what", "an", "nice", "day"]

def can_be_segmented(s, word_list):
    tested_str = s
    buildup_str = ''

    for letter in tested_str:        
        buildup_str += letter

        if buildup_str not in word_list:
            continue

        tested_str = tested_str[len(buildup_str):]
        buildup_str = ''

    return bool(tested_str == '' and buildup_str == '')

print(can_be_segmented(s, word_list))

Do you guys have an idea of how to fix it? Or maybe there is a better approach to this problem?

like image 815
Devligue Avatar asked Jul 03 '18 22:07

Devligue


People also ask

How do I split a string into a list of words?

To convert a string in a list of words, you just need to split it on whitespace. You can use split() from the string class. The default delimiter for this method is whitespace, i.e., when called on a string, it'll split that string at whitespace characters.

How do you check if a word in a list is in a string python?

Using any() to check if string contains element from list. Using any function is the most classical way in which you can perform this task and also efficiently. This function checks for match in string with match of each element of list.

How do you split a sentence into a letter in Python?

Python String split() MethodThe split() method splits a string into a list. You can specify the separator, default separator is any whitespace.

How do you store words in a list in Python?

Method#1: Using split() method The split method is used to split the strings and store them in the list. The built-in method returns a list of the words in the string, using the “delimiter” as the delimiter string.


1 Answers

>>> import re
>>> s = 'whataniceday'
>>> word_list = ['a', 'what', 'an', 'nice', 'day']
>>> re.match('^(' + '|'.join(f'({s})' for s in word_list) + ')*$', s)
<_sre.SRE_Match object; span=(0, 12), match='whataniceday'>

As a function:

import re
def can_be_segmented(s, word_list):
    pattern = re.compile('^(' + '|'.join(f'({s})' for s in word_list) + ')*$')
    return pattern.match(s) is not None

It may be an optimization to make the groups non-capturing ((?:word) rather than (word) so that re.match doesn't have to keep track of matched words, but I'm not going to time it.

If your words aren't all just letters you may want to pass them through re.escape() (as in f'({re.escape(s)})' instead of f'({s})').

If you are going to have mixed-case and you want those to match pass the re.IGNORECASE or re.I flag (as in pattern.match(s, re.I) instead of pattern.match(s)).

See the re documentation for more.

like image 52
Steven Rumbalski Avatar answered Sep 27 '22 20:09

Steven Rumbalski