What's the most effective way to solve the following pandas problem?
Here's a simplified example with some data in a data frame:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randint(0,10,size=(10, 4)), columns=['a','b','c','d'],
index=np.random.randint(0,10,size=10))
This data looks like this:
a b c d
1 0 0 9 9
0 2 2 1 7
3 9 3 4 0
2 5 0 9 4
1 7 7 7 2
6 4 4 6 4
1 1 6 0 0
7 8 0 9 3
5 0 0 8 3
4 5 0 2 4
Now I want to apply some function f
to each value in the data frame (the function below, for example) and get a data frame back as a resulting output. The tricky part is the function I'm applying depends on the value of the index I am currently at.
def f(cell_val, row_val):
"""some function which needs to know row_val to use it"""
try:
return cell_val/row_val
except ZeroDivisionError:
return -1
Normally, if I wanted to apply a function to each individual cell in the data frame, I would just call .applymap()
on f
. Even if I had to pass in a second argument ('row_val', in this case), if the argument was a fixed number I could just write a lambda expression such as lambda x: f(x,i)
where i
is the fixed number I wanted. However, my second argument varies depending on the row in the data frame I am currently calling the function from, which means that I can't just use .applymap()
.
How would I go about solving a problem like this efficiently? I can think of a few ways to do this, but none of them feel "right". I could:
applymap()
on my tuple data frame. But that seems pretty hacky and I'm also creating a completely separate data frame as an extra step.IIUC you can use div
with axis=0
plus you need to convert the Index
object to a Series
object using to_series
:
In [121]:
df.div(df.index.to_series(), axis=0).replace(np.inf, -1)
Out[121]:
a b c d
1 0.000000 0.000000 9.000000 9.000000
0 -1.000000 -1.000000 -1.000000 -1.000000
3 3.000000 1.000000 1.333333 0.000000
2 2.500000 0.000000 4.500000 2.000000
1 7.000000 7.000000 7.000000 2.000000
6 0.666667 0.666667 1.000000 0.666667
1 1.000000 6.000000 0.000000 0.000000
7 1.142857 0.000000 1.285714 0.428571
5 0.000000 0.000000 1.600000 0.600000
4 1.250000 0.000000 0.500000 1.000000
Additionally as division by zero results in inf
you need to call replace
to replace those rows with -1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With