Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python split string every 3rd value but into a nested format

Tags:

python

I have a list like so:

['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i']

I want it to look like so

[['a', 'b', 'c'],['d', 'e', 'f'],['g', 'h', 'i']]

what's the most efficient way to do this?

edit: what about going the other way?

[['a', 'b', 'c'],['d', 'e', 'f'],['g', 'h', 'i']]

-->

['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i']
like image 695
jason Avatar asked Apr 08 '14 09:04

jason


3 Answers

You can do what you want with a simple list comprehension.

>>> a = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
>>> [a[i:i+3] for i in range(0, len(a), 3)]
[[1, 2, 3], [4, 5, 6], [7, 8, 9], [10]]

If you want the last sub-list to be padded you can do this before the list comprehension:

>>> padding = 0
>>> a += [padding]*(3-len(a)%3)

Combining these together into a single function:

def group(sequence, group_length, padding=None):
    if padding is not None:
        sequence += [padding]*(group_length-len(sequence)%group_length)
    return [sequence[i:i+group_length] for i in range(0, len(sequence), group_length)]

Going the other way:

def flatten(sequence):
    return [item for sublist in sequence for item in sublist]

>>> a = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
>>> flatten(a)
[1, 2, 3, 4, 5, 6, 7, 8, 9]
like image 86
Scorpion_God Avatar answered Oct 21 '22 21:10

Scorpion_God


If you can use numpy, try x.reshape(-1, 3)

In [1]: import numpy as np
In [2]: x = ['a','b','c','d','e','f','g','h','i']
In [3]: x = np.array(x)
In [4]: x.reshape(-1, 3)
Out[4]: 
array([['a', 'b', 'c'],
       ['d', 'e', 'f'],
       ['g', 'h', 'i']], 
      dtype='|S1')

if data is big enough, this code is more efficient.

Update

appending cProfile results to explain more efficient

import cProfile
import numpy as np

a = range(10000000*3)

def impl_a():
    x = [a[i:i+3] for i in range(0, len(a), 3)]

def impl_b():
    x = np.array(a)
    x = x.reshape(-1, 3)

print("cProfile reuslt of impl_a()")
cProfile.run("impl_a()")
print("cProfile reuslt of impl_b()")
cProfile.run("impl_b()")

Output is

cProfile reuslt of impl_a()
      5 function calls in 15.614 seconds

Ordered by: standard name

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1    0.499    0.499   15.614   15.614 <string>:1(<module>)
     1   14.968   14.968   15.114   15.114 impla.py:6(impl_a)
     1    0.000    0.000    0.000    0.000 {len}
     1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
     1    0.146    0.146    0.146    0.146 {range}


cProfile reuslt of impl_b()
     5 function calls in 3.142 seconds

Ordered by: standard name

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1    0.000    0.000    3.142    3.142 <string>:1(<module>)
     1    0.000    0.000    3.142    3.142 impla.py:9(impl_b)
     1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
     1    0.000    0.000    0.000    0.000 {method 'reshape' of 'numpy.ndarray' objects}
     1    3.142    3.142    3.142    3.142 {numpy.core.multiarray.array}
like image 4
emeth Avatar answered Oct 21 '22 23:10

emeth


You can use the grouper recipe from itertools with a list comprehension:

from itertools import izip_longest # or zip_longest for Python 3.x

def grouper(iterable, n, fillvalue=None):
    "Collect data into fixed-length chunks or blocks"
    # grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx
    args = [iter(iterable)] * n
    return izip_longest(fillvalue=fillvalue, *args) # see note above

in_ = [1, 2, 3, 4, 5, 6, 7, 8, 9]

out = [list(t) for t in grouper(in_, 3)]
like image 3
jonrsharpe Avatar answered Oct 21 '22 23:10

jonrsharpe