Check if string can be splitted into sentence using words in provided list

Tags:

python-3.x

I've recently stumbled upon coding task, and I've struggled to get it right. It goes like this:

Given a non-empty string s and a list word_list containing a list of non-empty words, determine if s can be segmented into a space-separated sequence of one or more dictionary words. You may assume the word_list does not contain duplicates, but each word can be used more than once.

For example, given:

s = 'whataniceday'
word_list = ['a', 'what', 'an', 'nice', 'day']

Return True, because 'whataniceday' can be segmented as 'what a nice day'.

I came up with a pretty naive solution, that works for this particular example, but it is not hard to make it fail, for example by adding a word to word_list that other word in the list starts with (i.e. ['a', 'wha', 'what', 'an', 'nice', 'day']). There are plenty of other things that can mess up my solution, but anyway here goes:

s = "whataniceday"
word_list = ["h", "a", "what", "an", "nice", "day"]

def can_be_segmented(s, word_list):
    tested_str = s
    buildup_str = ''

    for letter in tested_str:        
        buildup_str += letter

        if buildup_str not in word_list:
            continue

        tested_str = tested_str[len(buildup_str):]
        buildup_str = ''

    return bool(tested_str == '' and buildup_str == '')

print(can_be_segmented(s, word_list))

Do you guys have an idea of how to fix it? Or maybe there is a better approach to this problem?

815

asked Jul 03 '18 22:07

Devligue

1 Answers

>>> import re
>>> s = 'whataniceday'
>>> word_list = ['a', 'what', 'an', 'nice', 'day']
>>> re.match('^(' + '|'.join(f'({s})' for s in word_list) + ')*$', s)
<_sre.SRE_Match object; span=(0, 12), match='whataniceday'>

As a function:

import re
def can_be_segmented(s, word_list):
    pattern = re.compile('^(' + '|'.join(f'({s})' for s in word_list) + ')*$')
    return pattern.match(s) is not None

It may be an optimization to make the groups non-capturing ((?:word) rather than (word) so that re.match doesn't have to keep track of matched words, but I'm not going to time it.

If your words aren't all just letters you may want to pass them through re.escape() (as in f'({re.escape(s)})' instead of f'({s})').

If you are going to have mixed-case and you want those to match pass the re.IGNORECASE or re.I flag (as in pattern.match(s, re.I) instead of pattern.match(s)).

See the re documentation for more.

answered Sep 27 '22 20:09

Steven Rumbalski

Related questions
                            
                                How can I convert nested dictionary to defaultdict?
                            
                                Python - Split array into multiple arrays
                            
                                Unpack list of lists into list [duplicate]
                            
                                Unable to runserver with docker-compose up
                            
                                "Expand" pandas dataframe by values in column
                            
                                How to crop a bounding box out of an image
                            
                                EOFError: marshal data too short
                            
                                Python Plotly - Multiple dropdown plots, each of which have subplots
                            
                                How to display 16-bit 4096 intensity image in Python openCV?
                            
                                How to divide each column of pandas Dataframe by a Series?
                            
                                psycopg2.extras.DictCursor not returning dict in postgres
                            
                                Why does the result of scipy.sparse.csc_matrix.sum() change its type to numpy matrix?
                            
                                Simple way to print binary numbers in groups of nibbles
                            
                                PySpark Boolean Pivot
                            
                                plot two seaborn heatmap graphs side by side
                            
                                Cachetools for subsequent runs in python
                            
                                Could not convert string to float error from the Titanic competition
                            
                                String Operation on captured group in re Python
                            
                                What is the most efficient way of doing square root of sum of square of two numbers?
                            
                                Move a worksheet in a workbook using openpyxl or xl* or xlsxwriter?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With