cartesian product in pandas

People also ask

How do you do Cartesian product in Pandas?

Step 1: First of all, import the library Pandas. Step 2: Then, obtain the datasets on which you want to perform a cartesian product. Step 3: Further, use a merge function to perform the cartesian product on the datasets obtained. Step 4: Finally, print the cartesian product obtained.

How do you do a cross join in Pandas?

In Pandas, there are parameters to perform left, right, inner or outer merge and join on two DataFrames or Series. However there's no possibility as of now to perform a cross join to merge or join two methods using how="cross" parameter. # merge on that key.

What does .values do in Pandas?

The values property is used to get a Numpy representation of the DataFrame. Only the values in the DataFrame will be returned, the axes labels will be removed. The values of the DataFrame. A DataFrame where all columns are the same type (e.g., int64) results in an array of the same type.

How do I merge two DataFrames in Pandas?

The concat() function can be used to concatenate two Dataframes by adding the rows of one to the other. The merge() function is equivalent to the SQL JOIN clause. 'left', 'right' and 'inner' joins are all possible.

In recent versions of Pandas (>= 1.2) this is built into merge so you can do:

from pandas import DataFrame
df1 = DataFrame({'col1':[1,2],'col2':[3,4]})
df2 = DataFrame({'col3':[5,6]})    

df1.merge(df2, how='cross')

This is equivalent to the previous pandas < 1.2 answer but is easier to read.

For pandas < 1.2:

If you have a key that is repeated for each row, then you can produce a cartesian product using merge (like you would in SQL).

from pandas import DataFrame, merge
df1 = DataFrame({'key':[1,1], 'col1':[1,2],'col2':[3,4]})
df2 = DataFrame({'key':[1,1], 'col3':[5,6]})

merge(df1, df2,on='key')[['col1', 'col2', 'col3']]

Output:

   col1  col2  col3
0     1     3     5
1     1     3     6
2     2     4     5
3     2     4     6

See here for the documentation: http://pandas.pydata.org/pandas-docs/stable/merging.html

Use pd.MultiIndex.from_product as an index in an otherwise empty dataframe, then reset its index, and you're done.

a = [1, 2, 3]
b = ["a", "b", "c"]

index = pd.MultiIndex.from_product([a, b], names = ["a", "b"])

pd.DataFrame(index = index).reset_index()

out:

Minimal code needed for this one. Create a common 'key' to cartesian merge the two:

df1['key'] = 0
df2['key'] = 0

df_cartesian = df1.merge(df2, how='outer')

This won't win a code golf competition, and borrows from the previous answers - but clearly shows how the key is added, and how the join works. This creates 2 new data frames from lists, then adds the key to do the cartesian product on.

My use case was that I needed a list of all store IDs on for each week in my list. So, I created a list of all the weeks I wanted to have, then a list of all the store IDs I wanted to map them against.

The merge I chose left, but would be semantically the same as inner in this setup. You can see this in the documentation on merging, which states it does a Cartesian product if key combination appears more than once in both tables - which is what we set up.

days = pd.DataFrame({'date':list_of_days})
stores = pd.DataFrame({'store_id':list_of_stores})
stores['key'] = 0
days['key'] = 0
days_and_stores = days.merge(stores, how='left', on = 'key')
days_and_stores.drop('key',1, inplace=True)

Related questions
                            
                                Can a dictionary be passed to django models on create?
                            
                                Running bash script from within python
                            
                                Python pandas insert list into a cell
                            
                                how does multiplication differ for NumPy Matrix vs Array classes?
                            
                                Python/postgres/psycopg2: getting ID of row just inserted
                            
                                Doing something before program exit
                            
                                How to plot a histogram using Matplotlib in Python with a list of data?
                            
                                How to convert a dictionary to query string in Python?
                            
                                Inserting a Python datetime.datetime object into MySQL
                            
                                Selecting pandas column by location
                            
                                Replace console output in Python
                            
                                List of tuples to dictionary [duplicate]
                            
                                How to add conda environment to jupyter lab
                            
                                How to generate a random number with a specific amount of digits?
                            
                                Hello World in Python [duplicate]
                            
                                Python: Importing urllib.quote
                            
                                jsonify a SQLAlchemy result set in Flask [duplicate]
                            
                                URL query parameters to dict python
                            
                                How to check BLAS/LAPACK linkage in NumPy and SciPy?
                            
                                Python time measure function

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

cartesian product in pandas

Tags:

python

pandas

People also ask

Recent Activity

Donate For Us