Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Apply function to pandas row-row cross product

I have two pandas DataFrames / Series containing one row each.

df1 = pd.DataFrame([1, 2, 3, 4])
df2 = pd.DataFrame(['one', 'two', 'three', 'four'])

I now want to get all possible combinations into an n*n matrix / DataFrame with values for all cross-products being the output from a custom function.

def my_function(x, y):
    return f"{x}:{y}"

This should therefore result in:

df = pd.DataFrame([['1:one', '2:one', '3:one', '4:one'],
                   ['1:two', '2:two', '3:two', '4:two'],
                   ['1:three', '2:three', '3:three', '4:three'],
                   ['1:four', '2:four', '3:four', '4:four']])

         0        1        2        3
0    1:one    2:one    3:one    4:one
1    1:two    2:two    3:two    4:two
2  1:three  2:three  3:three  4:three
3   1:four   2:four   3:four   4:four

While I can build my own matrix through itertools.product, this seems like a very inefficient way for larger datasets and I was wondering if there is a more pythonic way. Thank you in advance.

like image 533
BBQuercus Avatar asked Aug 03 '20 14:08

BBQuercus


People also ask

How will you apply a function to a row of pandas DataFrame?

Use apply() function when you wanted to update every row in pandas DataFrame by calling a custom function. In order to apply a function to every row, you should use axis=1 param to apply(). By applying a function to each row, we can create a new column by using the values from the row, updating the row e.t.c.

Can I use Loc and ILOC together?

In this case, loc and iloc are interchangeable when selecting via a single value or a list of values. Note that loc and iloc will return different results when selecting via slice and conditions.

How do you do Cartesian product in pandas?

Step 1: First of all, import the library Pandas. Step 2: Then, obtain the datasets on which you want to perform a cartesian product. Step 3: Further, use a merge function to perform the cartesian product on the datasets obtained. Step 4: Finally, print the cartesian product obtained.

How do you apply a DataFrame function in Python?

The apply() function is used to apply a function along an axis of the DataFrame. Objects passed to the function are Series objects whose index is either the DataFrame's index (axis=0) or the DataFrame's columns (axis=1).

How do I apply a function to every row in pandas?

In order to apply a function to every row, you should use axis=1 param to apply (). By applying a function to each row, we can create a new column by using the values from the row, updating the row e.t.c. Note that by default it uses axis=0 meaning it applies a function to each column. 1. Quick Examples of pandas Apply Function to Every Row

How to apply a function to each row/column in Dataframe?

There are different ways to apply a function to each row or column in DataFrame. We will learn about various ways in this post. Let’s create a small dataframe first and see that. Method 1: Applying lambda function to each row/column. In the above examples, we saw how a user defined function is applied to each row and column.

How to display the first few rows of a pandas Dataframe?

We can then use the head () function to display the first few rows of the dataframe. To get an idea of what columns and data types are present in the Pandas dataframe we will use the Pandas info () function. The info () function returns the names of the columns, their data types and the number of null values.

How to apply a function to every row in a table?

In order to apply a function to every row, you should use axis=1 param to apply (), default it uses axis=0 meaning it applies a function to each column. By applying a function to each row, we can create a new column by using the values from the row, updating the row e.t.c.


1 Answers

You also can use pd.DataFrame constructor with apply:

pd.DataFrame(index=df2.squeeze(), columns=df1.squeeze()).apply(lambda x: x.name.astype(str)+':'+x.index)

Output:

            1        2        3        4                                        
one      1:one    2:one    3:one    4:one
two      1:two    2:two    3:two    4:two
three  1:three  2:three  3:three  4:three
four    1:four   2:four   3:four   4:four

Explanation:

First, with pd.DataFrame constructor, first build and empty dataframe with index and columns defined from df2 and df1 respectively. Using pd.DataFrame.squeeze, we convert those single column dataframes into a pd.Series.

Next, using pd.DataFrame.apply, we can apply a lambda function which adds the strings from the column name with a colon and the dataframe index for each column of the dataframe.

This yeilds a new dataframe with indexing and desired values.

like image 109
Scott Boston Avatar answered Sep 22 '22 13:09

Scott Boston