Run function exactly once for each row in a Pandas dataframe

Tags:

If I have a function

def do_irreversible_thing(a, b):
    print a, b

And a dataframe, say

df = pd.DataFrame([(0, 1), (2, 3), (4, 5)], columns=['a', 'b'])

What's the best way to run the function exactly once for each row in a pandas dataframe. As pointed out in other questions, something like df.apply pandas will call the function twice for the first row. Even using numpy

np.vectorize(do_irreversible_thing)(df.a, df.b)

causes the function to be called twice on the first row, as will df.T.apply() or df.apply(..., axis=1).

Is there a faster or cleaner way to call the function with every row than this explicit loop?

   for idx, a, b in df.itertuples():
       do_irreversible_thing(a, b)

454

asked Apr 13 '16 20:04

David Nehme

2 Answers

The way I do it (because I also don't like the idea of looping with df.itertuples) is:

df.apply(do_irreversible_thing, axis=1)

and then your function should be like:

def do_irreversible_thing(x):
    print x.a, x.b

this way you should be able to run your function over each row.

if you can't modify your function you could apply it like this

df.apply(lambda x: do_irreversible_thing(x[0],x[1]), axis=1)

110

answered Sep 28 '22 12:09

Rosa Alejandra

It's unclear what your function is doing but to apply a function to each row you can do so by passing axis=1 to apply your function row-wise and pass the column elements of interest:

In [155]:
def foo(a,b):
    return a*b

df = pd.DataFrame([(0, 1), (2, 3), (4, 5)], columns=['a', 'b'])
df.apply(lambda x: foo(x['a'], x['b']), axis=1)

Out[155]:
0     0
1     6
2    20
dtype: int64

However, so long as your function does not depend on the df mutating on each row, then you can just use a vectorised method to operate on the entire column:

In [156]:
df['a'] * df['b']

Out[156]:
0     0
1     6
2    20
dtype: int64

The reason is that because the functions are vectorised then it will scale better whilst the apply is just syntactic sugar for iterating on your df so it's a for loop essentially

answered Sep 28 '22 14:09

EdChum

Related questions
                            
                                Best practice when defining instance variables
                            
                                Recursive module import and reload
                            
                                How can I pool connections using psycopg and gevent?
                            
                                Preserving original doctype and declaration of an lxml.etree parsed xml
                            
                                I don't understand Python's main block. What is that thing? [duplicate]
                            
                                python regex, match in multiline, but still want to get the line number
                            
                                A Nose plugin to specify the order of unit test execution
                            
                                mocking session in requests library
                            
                                Drawing directions fields
                            
                                Getting header row from numpy.genfromtxt
                            
                                Understanding matplotlib xticks syntax
                            
                                Simple explanation of Google App Engine NDB Datastore
                            
                                Assert that two dictionaries are almost equal
                            
                                Python setup.py include .json files in the egg
                            
                                Moving back and forth between an on-disk database and a fast in-memory database?
                            
                                Why shouldn't Flask be deployed with the built in server?
                            
                                Open Source based Rules Engines in Java or Python [closed]
                            
                                Acessing POST field data without a form (REST api) using Django
                            
                                Use anaconda environment without activate? (e.g. in Crontab)
                            
                                If we want use S3 to host Python packages, how can we tell pip where to find the newest version?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Run function exactly once for each row in a Pandas dataframe

Tags:

python

function

pandas

numpy

David Nehme

People also ask

2 Answers

Rosa Alejandra

EdChum

Recent Activity

Donate For Us