Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

creating a pandas dataframe based on cell content of two other dataframes

I have wo dataframes with the same number of rows and columns. I would like to create a third dataframe based on these two dataframes that has the same dimensions as the other two dataframes. Each cell in the third dataframe should be the result by a function applied to the corresponding cell values in df1 and df2 respectively.

i.e. if I have

df1 = | 1 | 2 |
      | 3 | 4 |

df2 = | 5 | 6 |
      | 7 | 8 |

then df3 should be like this

df3 = | func(1, 5) | func(2, 6) |
      | func(3, 7) | func(4, 8) |

I have a way to do this that I do not think is very pythonic nor appropriate for large dataframes and would like to know if there is an efficient way to do such a thing?

The function I wish to apply is:

def smape3(y, yhat, axis=0):
    all_zeros = not (np.any(y) and np.any(yhat))
    if all_zeros:
        return 0.0
    return np.sum(np.abs(yhat - y), axis) / np.sum(np.abs(yhat + y), axis)

It can be used to produce a single scalar value OR an array of values. In my use case above the input to the function would be two scalar values. So smape(1, 5) = 0.66.

like image 466
Aesir Avatar asked Dec 20 '25 00:12

Aesir


1 Answers

You can use a vectorised approach:

df1 = pd.DataFrame([[1, 2], [3, 4]])
df2 = pd.DataFrame([[5, 6], [7, 8]])

arr = np.where(df1.eq(0) & df2.eq(0), 0, (df2 - df1).abs() / (df2 + df1).abs())

df = pd.DataFrame(arr)

print(df)

          0         1
0  0.666667  0.500000
1  0.400000  0.333333

Or if you want to separate some of the logic in a function:

def smape3(df1, df2):
    return (df2 - df1).abs() / (df2 + df1).abs()

df = pd.DataFrame(np.where(df1.eq(0) & df2.eq(0), 0, smape3(df1, df2)))
like image 77
jpp Avatar answered Dec 22 '25 17:12

jpp