Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Equivalent of Series.map for DataFrame?

Tags:

python

pandas

Using Series.map with a Series argument, I can take the elements of a Series and use them as indices into another Series. I want to do the same thing with some columns of a DataFrame, using each row as a set of index levels into a MultiIndex-ed Series. Here is an example:

>>> d = pandas.DataFrame([["A", 1], ["B", 2], ["C", 3]], columns=["X", "Y"])
>>> d
   X  Y
0  A  1
1  B  2
2  C  3

[3 rows x 2 columns]
>>> s = pandas.Series(np.arange(9), index=pandas.MultiIndex.from_product([["A", "B", "C"], [1, 2, 3]]))
>>> s
A  1    0
   2    1
   3    2
B  1    3
   2    4
   3    5
C  1    6
   2    7
   3    8
dtype: int32

What I would like is to be able to do d.map(s), so that each row of d should be taken as a tuple to use to index into the MultiIndex of s. That is, I want the same result as this:

>>> s.ix[[("A", 1), ("B", 2), ("C", 3)]]
A  1    0
B  2    4
C  3    8
dtype: int32

However, DataFrame, unlike Series, has no map method. The other obvious alternative, s.ix[d], gives me the error "Cannot index with multidimensional key", so this is apparently not supported either.

I know I can do it by converting the DataFrame to a list of lists, or by using a row-wise apply to grab each item one by one, but isn't there any way to do it without that amount of overhead? How can I do the equivalent of Series.map on multiple columns at once?

like image 826
BrenBarn Avatar asked Nov 02 '22 03:11

BrenBarn


1 Answers

You could create a MultiIndex from the DataFrame and the ix/loc using that:

In [11]: mi = pd.MultiIndex.from_arrays(d.values.T)

In [12]: s.loc[mi]  # can use ix too
Out[12]:
A  1    0
B  2    4
C  3    8
dtype: int64

This is pretty efficient:

In [21]: s = pandas.Series(np.arange(1000*1000), index=pandas.MultiIndex.from_product([range(1000), range(1000)]))

In [22]: d = pandas.DataFrame(zip(range(1000), range(1000)), columns=["X", "Y"])

In [23]: %timeit mi = pd.MultiIndex.from_arrays(d.values.T); s.loc[mi]
100 loops, best of 3: 2.77 ms per loop

In [24]: %timeit s.apply(lambda x: x + 1)  # at least compared to apply
1 loops, best of 3: 3.14 s per loop
like image 122
Andy Hayden Avatar answered Nov 05 '22 00:11

Andy Hayden