Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python splitting list to sublist at start and end keyword patterns

If I were to have a list, say:

lst = ['foo', 'bar', '!test', 'hello', 'world!', 'word']

with a character of !, how would I return a list given:

lst = ['foo', 'bar', ['test', 'hello', 'world'], 'word']

I'm having some difficulty finding a solution for this. Here's one approach I've tried:

def define(lst):
    for index, item in enumerate(lst):
        if item[0] == '!' and lst[index+2][-1] == '!':
            temp = lst[index:index+3]
            del lst[index+1:index+2]
            lst[index] = temp
    return lst

Any help would be greatly appreciated.

like image 789
Leo Whitehead Avatar asked Apr 01 '18 09:04

Leo Whitehead


2 Answers

Assuming that there is no elements which starts & ends with ! like '!foo!'.

First of all we can write helper predicates like

def is_starting_element(element):
    return element.startswith('!')


def is_ending_element(element):
    return element.endswith('!')

Then we can write generator-function (because they are awesome)

def walk(elements):
    elements = iter(elements)  # making iterator from passed iterable
    for position, element in enumerate(elements):
        if is_starting_element(element):
            yield [element[1:], *walk(elements)]
        elif is_ending_element(element):
            yield element[:-1]
            return
        else:
            yield element

Tests:

>>> lst = ['foo', 'bar', '!test', 'hello', 'world!', 'word']
>>> list(walk(lst))
['foo', 'bar', ['test', 'hello', 'world'], 'word']
>>> lst = ['foo', 'bar', '!test', '!hello', 'world!', 'word!']
>>> list(walk(lst))
['foo', 'bar', ['test', ['hello', 'world'], 'word']]
>>> lst = ['hello!', 'world!']
>>> list(walk(lst))
['hello']

as we can see from the last example if there are more closing elements than opening ones remaining closing elements will be ignored (this is because we're returning from generator). So if lst has invalid signature (difference between opening and closing elements is not equal to zero) then we can have some unpredictable behavior. As a way out of this situation we can validate given data before processing and raise error if data is invalid.

We can write validator like

def validate_elements(elements):
    def get_sign(element):
        if is_starting_element(element):
            return 1
        elif is_ending_element(element):
            return -1
        else:
            return 0

    signature = sum(map(get_sign, elements))
    are_elements_valid = signature == 0
    if not are_elements_valid:
        error_message = 'Data is invalid: '
        if signature > 0:
            error_message += ('there are more opening elements '
                              'than closing ones.')
        else:
            error_message += ('there are more closing elements '
                              'than opening ones.')
        raise ValueError(error_message)

Tests

>>> lst = ['!hello', 'world!']
>>> validate_elements(lst)  # no exception raised, data is valid
>>> lst = ['!hello', '!world']
>>> validate_elements(lst)
...
ValueError: Data is invalid: there are more opening elements than closing ones.
>>> lst = ['hello!', 'world!']
>>> validate_elements(lst)
...
ValueError: Data is invalid: there are more closing elements than opening ones.

Finally we can write function with validation like

def to_sublists(elements):
    validate_elements(elements)
    return list(walk(elements))

Tests

>>> lst = ['foo', 'bar', '!test', 'hello', 'world!', 'word']
>>> to_sublists(lst)
['foo', 'bar', ['test', 'hello', 'world'], 'word']
>>> lst = ['foo', 'bar', '!test', '!hello', 'world!', 'word!']
>>> to_sublists(lst)
['foo', 'bar', ['test', ['hello', 'world'], 'word']]
>>> lst = ['hello!', 'world!']
>>> to_sublists(lst)
...
ValueError: Data is invalid: there are more closing elements than opening ones.

EDIT

If we want to handle elements which starts & ends with ! like '!bar!' we can modify walk function using itertools.chain like

from itertools import chain


def walk(elements):
    elements = iter(elements)
    for position, element in enumerate(elements):
        if is_starting_element(element):
            yield list(walk(chain([element[1:]], elements)))
        elif is_ending_element(element):
            element = element[:-1]
            yield element
            return
        else:
            yield element

also we need to complete validation by just modifying get_sign function

def get_sign(element):
    if is_starting_element(element):
        if is_ending_element(element):
            return 0
        return 1
    if is_ending_element(element):
        return -1
    return 0

Tests

>>> lst = ['foo', 'bar', '!test', '!baz!', 'hello', 'world!', 'word']
>>> to_sublists(lst)
['foo', 'bar', ['test', ['baz'], 'hello', 'world'], 'word']
like image 200
Azat Ibrakov Avatar answered Oct 13 '22 06:10

Azat Ibrakov


Here's an iterative solution that can handle arbitrarily nested lists:

def nest(lst, sep):
    current_list = []
    nested_lists = [current_list]  # stack of nested lists
    for item in lst:
        if item.startswith(sep):
            if item.endswith(sep):
                item = item[len(sep):-len(sep)]  # strip both separators
                current_list.append([item])
            else:
                # start a new nested list and push it onto the stack
                new_list = []
                current_list.append(new_list)
                current_list = new_list
                nested_lists.append(current_list)
                current_list.append(item[len(sep):])  # strip the separator
        elif item.endswith(sep):
            # finalize the deepest list and go up by one level
            current_list.append(item[:-len(sep)])  # strip the separator
            nested_lists.pop()
            current_list = nested_lists[-1]
        else:
            current_list.append(item)

    return current_list

Test run:

>>> nest(['foo', 'bar', '!test', '!baz!', 'hello', 'world!', 'word'], '!')
['foo', 'bar', ['test', ['baz'], 'hello', 'world'], 'word']

The way it works is to maintain a stack of nested lists. Every time a new nested list is created, it gets pushed onto the stack. Elements are always appended to the last list in the stack. When an element that ends with "!" is found, the topmost list is removed from the stack.

like image 44
Aran-Fey Avatar answered Oct 13 '22 07:10

Aran-Fey