I have two pandas DataFrames / Series containing one row each.
df1 = pd.DataFrame([1, 2, 3, 4])
df2 = pd.DataFrame(['one', 'two', 'three', 'four'])
I now want to get all possible combinations into an n*n matrix / DataFrame with values for all cross-products being the output from a custom function.
def my_function(x, y):
return f"{x}:{y}"
This should therefore result in:
df = pd.DataFrame([['1:one', '2:one', '3:one', '4:one'],
['1:two', '2:two', '3:two', '4:two'],
['1:three', '2:three', '3:three', '4:three'],
['1:four', '2:four', '3:four', '4:four']])
0 1 2 3
0 1:one 2:one 3:one 4:one
1 1:two 2:two 3:two 4:two
2 1:three 2:three 3:three 4:three
3 1:four 2:four 3:four 4:four
While I can build my own matrix through itertools.product
, this seems like a very inefficient way for larger datasets and I was wondering if there is a more pythonic way. Thank you in advance.
Use apply() function when you wanted to update every row in pandas DataFrame by calling a custom function. In order to apply a function to every row, you should use axis=1 param to apply(). By applying a function to each row, we can create a new column by using the values from the row, updating the row e.t.c.
In this case, loc and iloc are interchangeable when selecting via a single value or a list of values. Note that loc and iloc will return different results when selecting via slice and conditions.
Step 1: First of all, import the library Pandas. Step 2: Then, obtain the datasets on which you want to perform a cartesian product. Step 3: Further, use a merge function to perform the cartesian product on the datasets obtained. Step 4: Finally, print the cartesian product obtained.
The apply() function is used to apply a function along an axis of the DataFrame. Objects passed to the function are Series objects whose index is either the DataFrame's index (axis=0) or the DataFrame's columns (axis=1).
In order to apply a function to every row, you should use axis=1 param to apply (). By applying a function to each row, we can create a new column by using the values from the row, updating the row e.t.c. Note that by default it uses axis=0 meaning it applies a function to each column. 1. Quick Examples of pandas Apply Function to Every Row
There are different ways to apply a function to each row or column in DataFrame. We will learn about various ways in this post. Let’s create a small dataframe first and see that. Method 1: Applying lambda function to each row/column. In the above examples, we saw how a user defined function is applied to each row and column.
We can then use the head () function to display the first few rows of the dataframe. To get an idea of what columns and data types are present in the Pandas dataframe we will use the Pandas info () function. The info () function returns the names of the columns, their data types and the number of null values.
In order to apply a function to every row, you should use axis=1 param to apply (), default it uses axis=0 meaning it applies a function to each column. By applying a function to each row, we can create a new column by using the values from the row, updating the row e.t.c.
You also can use pd.DataFrame constructor with apply
:
pd.DataFrame(index=df2.squeeze(), columns=df1.squeeze()).apply(lambda x: x.name.astype(str)+':'+x.index)
Output:
1 2 3 4
one 1:one 2:one 3:one 4:one
two 1:two 2:two 3:two 4:two
three 1:three 2:three 3:three 4:three
four 1:four 2:four 3:four 4:four
Explanation:
First, with pd.DataFrame constructor, first build and empty dataframe with index and columns defined from df2 and df1 respectively. Using pd.DataFrame.squeeze
, we convert those single column dataframes into a pd.Series.
Next, using pd.DataFrame.apply
, we can apply a lambda function which adds the strings from the column name with a colon and the dataframe index for each column of the dataframe.
This yeilds a new dataframe with indexing and desired values.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With