Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using Pandas to "applymap" with access to index/column?

What's the most effective way to solve the following pandas problem?

Here's a simplified example with some data in a data frame:

import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randint(0,10,size=(10, 4)), columns=['a','b','c','d'], 
                  index=np.random.randint(0,10,size=10))

This data looks like this:

   a  b  c  d
1  0  0  9  9
0  2  2  1  7
3  9  3  4  0
2  5  0  9  4
1  7  7  7  2
6  4  4  6  4
1  1  6  0  0
7  8  0  9  3
5  0  0  8  3
4  5  0  2  4

Now I want to apply some function f to each value in the data frame (the function below, for example) and get a data frame back as a resulting output. The tricky part is the function I'm applying depends on the value of the index I am currently at.

def f(cell_val, row_val):
    """some function which needs to know row_val to use it"""
    try:
        return cell_val/row_val
    except ZeroDivisionError:
        return -1

Normally, if I wanted to apply a function to each individual cell in the data frame, I would just call .applymap() on f. Even if I had to pass in a second argument ('row_val', in this case), if the argument was a fixed number I could just write a lambda expression such as lambda x: f(x,i) where i is the fixed number I wanted. However, my second argument varies depending on the row in the data frame I am currently calling the function from, which means that I can't just use .applymap().

How would I go about solving a problem like this efficiently? I can think of a few ways to do this, but none of them feel "right". I could:

  • loop through each individual value and replace them one by one, but that seems really awkward and slow.
  • create a completely separate data frame containing (cell value, row value) tuples and use the builtin pandas applymap() on my tuple data frame. But that seems pretty hacky and I'm also creating a completely separate data frame as an extra step.
  • there must be a better solution to this (a fast solution would be appreciated, because my data frame could get very large).
like image 218
K. Mao Avatar asked Oct 30 '22 19:10

K. Mao


1 Answers

IIUC you can use div with axis=0 plus you need to convert the Index object to a Series object using to_series:

In [121]:
df.div(df.index.to_series(), axis=0).replace(np.inf, -1)

Out[121]:
          a         b         c         d
1  0.000000  0.000000  9.000000  9.000000
0 -1.000000 -1.000000 -1.000000 -1.000000
3  3.000000  1.000000  1.333333  0.000000
2  2.500000  0.000000  4.500000  2.000000
1  7.000000  7.000000  7.000000  2.000000
6  0.666667  0.666667  1.000000  0.666667
1  1.000000  6.000000  0.000000  0.000000
7  1.142857  0.000000  1.285714  0.428571
5  0.000000  0.000000  1.600000  0.600000
4  1.250000  0.000000  0.500000  1.000000

Additionally as division by zero results in inf you need to call replace to replace those rows with -1

like image 186
EdChum Avatar answered Nov 15 '22 07:11

EdChum