Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Create a dataframe of permutations in pandas from list

I have the following lists:

aa = ['aa1', 'aa2', 'aa3', 'aa4', 'aa5']
bb = ['bb1', 'bb2', 'bb3', 'bb4', 'bb5']
cc = ['cc1', 'cc2', 'cc3', 'cc4', 'cc5']

I want to create a pandas dataframe as such:

aa    bb    cc
aa1   bb1   cc1
aa2   bb1   cc1
aa3   bb1   cc1
aa4   bb1   cc1
aa5   bb1   cc1
aa1   bb2   cc1
aa1   bb3   cc1
aa1   bb4   cc1
aa1   bb5   cc1
aa1   bb1   cc2
aa1   bb1   cc3
aa1   bb1   cc4
aa1   bb1   cc5

I'm stuck as to how to do this. I've looked at examples: How to generate all permutations of a list in Python

I can do each permutation individually using:

import itertools
itertools.permutations(['aa1','aa2','aa3','aa4','aa5'])

I have a few tens of lists and ideally, I'd like to do them automatically.

Appreciate any help!

like image 646
Kvothe Avatar asked Aug 14 '17 10:08

Kvothe


People also ask

Can we create DataFrame from list?

The pandas DataFrame can be created by using the list of lists, to do this we need to pass a python list of lists as a parameter to the pandas. DataFrame() function. Pandas DataFrame will represent the data in a tabular format, like rows and columns.

Can you convert a list to DataFrame in Python?

We can create data frames using lists in the dictionary.

Can we create pandas DataFrame using series?

Series is a type of list in pandas which can take integer values, string values, double values and more. But in Pandas Series we return an object in the form of list, having index starting from 0 to n, Where n is the length of values in series.

Can we create DataFrame from zip objects?

One of the way to create Pandas DataFrame is by using zip() function. You can use the lists to create lists of tuples and create a dictionary from it. Then, this dictionary can be used to construct a dataframe. zip() function creates the objects and that can be used to produce single item at a time.


1 Answers

I believe you need itertools.product, not permutations.

In [287]: lists = [aa, bb, cc]

In [288]: pd.DataFrame(list(itertools.product(*lists)), columns=['aa', 'bb', 'cc'])
Out[288]: 
      aa   bb   cc
0    aa1  bb1  cc1
1    aa1  bb1  cc2
2    aa1  bb1  cc3
3    aa1  bb1  cc4
4    aa1  bb1  cc5
5    aa1  bb2  cc1
6    aa1  bb2  cc2
7    aa1  bb2  cc3
8    aa1  bb2  cc4
...

This will give you the Cartesian product of your lists. As of now, the column names are hardcoded, but you can use df.rename to dynamically rename them.

like image 170
cs95 Avatar answered Sep 19 '22 17:09

cs95