I want to apply an arbitrary function to each pair of columns in a pandas DataFrame in a fast and neat manner. Generally, the return value is a scalar in which case I would like the result to be a new dataframe analogous to what df.corr() returns. Syntactic convenience is usually a higher priority than computation speed.
With some function f, I would like a dataframe as in
a b
----------------------------------------
a | f(df["a"], df["a"]) f(df["a"], df["b"])
b | f(df["b"], df["a"]) f(df["b"], df["b"])
Ex: (the actual f is arbitrary)
df = pd.DataFrame({"a": range(4), "b": range(1, 5)})
df
>>>
a b
------
0 1
1 2
2 3
3 4
def f(c1, c2):
return max(min(c1), min(c2))
Wished result:
a b
--------
a | 0 1
b | 1 1
I've found a way of doing it but it's a little bit tricky and I definitely wouldn't recommend it. Anyway, you can use pandas.DataFrame.corrwith and change the method parameter by your function:
result = [df.corrwith(df[col], method=f) for col in df]
result_df = pd.DataFrame(result, index = df.columns)
As I said before, I wouldn't recommend this, instead, I'd do something like:
result_df = pd.DataFrame(columns = df.columns, index = df.columns)
for col1 in df:
for col2 in df:
result_df[col1][col2] = f(df[col1],df[col2])
Which I think is really clear. In both cases printing result_df gives you:
a b
a 0 1
b 1 1
By the way, it's not possible to use pandas.DataFrame.corr for doing this because the returned dataframe will have only 1s along the diagonal by following the heuristic "the correlation of x with x is always 1".
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With