Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas row specific apply

Tags:

python

pandas

Similar to this R question, I'd like to apply a function to each item in a Series (or each row in a DataFrame) using Pandas, but want to use as an argument to this function the index or id of that row. As a trivial example, suppose one wants to create a list of tuples of the form [(index_i, value_i), ..., (index_n, value_n)]. Using a simple Python for loop, I can do:

In [1] L = []
In [2] s = Series(['six', 'seven', 'six', 'seven', 'six'],
           index=['a', 'b', 'c', 'd', 'e'])
In [3] for i, item in enumerate(s):
           L.append((i,item))
In [4] L
Out[4] [(0, 'six'), (1, 'seven'), (2, 'six'), (3, 'seven'), (4, 'six')]

But there must be a more efficient way to do this? Perhaps something more Panda-ish like Series.apply? In reality, I'm not worried (in this case) about returning anything meaningful, but more for the efficiency of something like 'apply'. Any ideas?

like image 380
Carson Farmer Avatar asked Jun 23 '12 15:06

Carson Farmer


2 Answers

If you use the apply method with a function what happens is that every item in the Series will be mapped with such a function. E.g.

>>> s.apply(enumerate)
a    <enumerate object at 0x13cf910>
b    <enumerate object at 0x13cf870>
c    <enumerate object at 0x13cf820>
d    <enumerate object at 0x13cf7d0>
e    <enumerate object at 0x13ecdc0>

What you want to do is simply to enumerate the series itself.

>>> list(enumerate(s))
[(0, 'six'), (1, 'seven'), (2, 'six'), (3, 'seven'), (4, 'six')]

What if for example you wanted to sum the string of all the entities?

>>> ",".join(s)
'six,seven,six,seven,six'

A more complex usage of apply would be this one:

>>> from functools import partial
>>> s.apply(partial(map, lambda x: x*2 ))
a                ['ss', 'ii', 'xx']
b    ['ss', 'ee', 'vv', 'ee', 'nn']
c                ['ss', 'ii', 'xx']
d    ['ss', 'ee', 'vv', 'ee', 'nn']
e                ['ss', 'ii', 'xx']

[Edit]

Following the OP's question for clarifications: Don't confuse Series (1D) with DataFrames (2D) http://pandas.pydata.org/pandas-docs/stable/dsintro.html#dataframe - as I don't really see how you can talk about rows. However you can include indices in your function by creating a new series (apply wont give you any information about the current index):

>>> Series([s[x]+" my index is:  "+x for x in s.keys()], index=s.keys())
a      six index  a
b    seven index  b
c      six index  c
d    seven index  d
e      six index  e

Anyhow I would suggest that you switch to other data types to avoid huge memory leaks.

like image 184
luke14free Avatar answered Oct 22 '22 09:10

luke14free


Here's a neat way, using itertools's count and zip:

import pandas as pd
from itertools import count

s = pd.Series(['six', 'seven', 'six', 'seven', 'six'],
                  index=['a', 'b', 'c', 'd', 'e'])

In [4]: zip(count(), s)
Out[4]: [(0, 'six'), (1, 'seven'), (2, 'six'), (3, 'seven'), (4, 'six')]

Unfortunately, only as efficient than enumerate(list(s))!

like image 30
Andy Hayden Avatar answered Oct 22 '22 10:10

Andy Hayden