Python - Split array into multiple arrays

Tags:

python

arrays

I have a array contains file names like below:

['001_1.png', '001_2.png', '001_3.png', '002_1.png','002_2.png', '003_1.png', '003_2.png', '003_3.png', '003_4.png', ....]

I want to quickly group these files into multiple arrays like this:

[['001_1.png', '001_2.png', '001_3.png'], ['002_1.png', '002_2.png'], ['003_1.png', '003_2.png', '003_3.png', '003_4.png'], ...]

Could anyone tell me how to do it in few lines in python?

564

asked May 04 '18 07:05

eric2323223

2 Answers

If your data is already sorted by the file name, you can use itertools.groupby:

files = ['001_1.png', '001_2.png', '001_3.png', '002_1.png','002_2.png',
        '003_1.png', '003_2.png', '003_3.png']

import itertools

keyfunc = lambda filename: filename[:3]

# this creates an iterator that yields `(group, filenames)` tuples,
# but `filenames` is another iterator
grouper = itertools.groupby(files, keyfunc)

# to get the result as a nested list, we iterate over the grouper to
# discard the groups and turn the `filenames` iterators into lists
result = [list(files) for _, files in grouper]

print(list(result))
# [['001_1.png', '001_2.png', '001_3.png'],
#  ['002_1.png', '002_2.png'],
#  ['003_1.png', '003_2.png', '003_3.png']]

Otherwise, you can base your code on this recipe, which is more efficient than sorting the list and then using groupby.

Input: Your input is a flat list, so use a regular ol' loop to iterate over it:
```
for filename in files:
```
Group identifier: The files are grouped by the first 3 letters:
```
group = filename[:3]
```
Output: The output should be a nested list rather than a dict, which can be done with
```
result = list(groupdict.values())
```

Putting it together:

files = ['001_1.png', '001_2.png', '001_3.png', '002_1.png','002_2.png',
        '003_1.png', '003_2.png', '003_3.png']

import collections

groupdict = collections.defaultdict(list)
for filename in files:
    group = filename[:3]
    groupdict[group].append(filename)

result = list(groupdict.values())

print(result)
# [['001_1.png', '001_2.png', '001_3.png'],
#  ['002_1.png', '002_2.png'],
#  ['003_1.png', '003_2.png', '003_3.png']]

Read the recipe answer for more details.

182

answered Sep 20 '22 03:09

Aran-Fey

Something like that should work:

import itertools


mylist = [...]
[list(v) for k,v in itertools.groupby(mylist, key=lambda x: x[:3])]

If input list isn't sorted, than use something like that:

import itertools


mylist = [...]
keyfunc = lambda x:x[:3]
mylist = sorted(mylist, key=keyfunc)
[list(v) for k,v in itertools.groupby(mylist, key=keyfunc)]

answered Sep 22 '22 03:09

oxyum

Related questions
                            
                                Understanding Partial Dependence for Gradient Boosted Regression trees
                            
                                How to get value of a column based on the maximum of another column in case of DataFrame.groupby
                            
                                "detail": "Method \"GET\" not allowed. on calling endpoint in django
                            
                                Count zero rows in 2D numpy array
                            
                                Merge items on dataframes with duplicate values
                            
                                Extracting the person names in the named entity recognition in NLP using Python
                            
                                Django Model's DateTimeField is taking UTC even when timezone is Asia/Calcutta everywhere
                            
                                Train Spacy NER on Indian Names
                            
                                How to apply function to slice of columns using .loc?
                            
                                selenium.common.exceptions.WebDriverException: Message: 'library' executable may have wrong permissions for ChromeDriver
                            
                                Installing GPU support for LightGBM on Google Collab
                            
                                How to calculate the cumulative distribution function in python without using scipy
                            
                                How can I search for specific keys in this nested dictionary in Python?
                            
                                Priority in grammar using Lark
                            
                                Python Pandas - Add values from one dataframe to another by matching labels to columns
                            
                                customized transformerMixin with data labels in sklearn
                            
                                Run two python files at the same time
                            
                                Saving a dataframe result value to a string variable?
                            
                                Show categorical x-axis values when making line plot from pandas Series in matplotlib
                            
                                How can I convert nested dictionary to defaultdict?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With