Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Operate on a list in a pythonic way when output depends on other elements

Tags:

python

I have a task requiring an operation on every element of a list, with the outcome of the operation depending on other elements in the list.

For example, I might like to concatenate a list of strings conditional on them starting with a particular character:

This code solves the problem:

x = ['*a', 'b', 'c', '*d', 'e', '*f', '*g']
concat = []
for element in x:
    if element.startswith('*'):
        concat.append(element)
    else:
        concat[len(concat) - 1] += element

resulting in:

concat
Out[16]: ['*abc', '*de', '*f', '*g']

But this seems horribly un-Pythonic. How should one operate on the elements of a list when the outcome of the operation depends on previous outcomes?

like image 998
LondonRob Avatar asked Apr 27 '15 18:04

LondonRob


2 Answers

A few relevant excerpts from import this (the arbiter of what is Pythonic):

  • Simple is better than complex
  • Readability counts
  • Explicit is better than implicit.

I would just use code like this, and not worry about replacing the for loop with something "flatter".

x = ['*a', 'b', 'c', '*d', 'e', '*f', '*g']
partials = []
for element in x:
    if element.startswith('*'):
        partials.append([])
    partials[-1].append(element)
concat = map("".join, partials)
like image 191
chepner Avatar answered Oct 14 '22 22:10

chepner


You could use regex to accomplish this succinctly. This does however, sort of circumvent your question regarding how to operate on dependent list elements. Credits to mbomb007 for improving the allowed character functionality.

import re
z = re.findall('\*[^*]+',"".join(x))

Outputs:

['*abc', '*de', '*f', '*g']

Small benchmarking:

Donkey Kong's answer:

import timeit
setup = '''
import re
x = ['*a', 'b', 'c', '*d', 'e', '*f', '*g']
y = ['*a', 'b', 'c', '*d', 'e', '*f', '*g'] * 100
'''
print (min(timeit.Timer('re.findall("\*[^\*]+","".join(x))', setup=setup).repeat(7, 1000)))
print (min(timeit.Timer('re.findall("\*[^\*]+","".join(y))', setup=setup).repeat(7, 1000)))

Returns 0.00226416693456, and 0.06827958075, respectively.

Chepner's answer:

setup = '''
x = ['*a', 'b', 'c', '*d', 'e', '*f', '*g']
y = ['*a', 'b', 'c', '*d', 'e', '*f', '*g'] * 100
def chepner(x):
    partials = []
    for element in x:
        if element.startswith('*'):
            partials.append([])
        partials[-1].append(element)
    concat = map("".join, partials)  
    return concat
'''
print (min(timeit.Timer('chepner(x)', setup=setup).repeat(7, 1000)))
print (min(timeit.Timer('chepner(y)', setup=setup).repeat(7, 1000)))

Returns 0.00456210269896 and 0.364635824689, respectively.

Saksham's answer

setup = '''
x = ['*a', 'b', 'c', '*d', 'e', '*f', '*g'] 
y = ['*a', 'b', 'c', '*d', 'e', '*f', '*g'] * 100

'''
print (min(timeit.Timer("['*'+item for item in ''.join(x).split('*') if item]", setup=setup).repeat(7, 1000)))
print (min(timeit.Timer("['*'+item for item in ''.join(y).split('*') if item]", setup=setup).repeat(7, 1000))))

Returns 0.00104848906006, and 0.0556093171512 respectively.

tl;dr Saksham's is slightly faster than mine, then Chepner's follows both of ours.

like image 28
miradulo Avatar answered Oct 14 '22 22:10

miradulo