Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas data frame - reduce with initial value

I'm moving some of my R stuff to Python, hence I have to use pandas.DataFrames. There are several things I'd like to optimise.

Suppose we've got a table

key value
abc 1
abc 2
abd 1

and we want to get a dictionary of form {key -> list[values]}. Here is how I get this done right now.

from pandas import DataFrame
from StringIO import StringIO


def get_dict(df):
    """
    :param df:
    :type df: DataFrame
    """
    def f(accum, row):
        """
        :param accum:
        :type accum: dict
        """
        key, value = row[1]
        return accum.setdefault(key, []).append(value) or accum
    return reduce(f, df.iterrows(), {})


table = StringIO("key\tvalue\nabc\t1\nabc\t2\nabd\t1")
parsed_table = [row.rstrip().split("\t") for row in table]
df = DataFrame(parsed_table[1:], columns=parsed_table[0])
result = get_dict(df)  # -> {'abc': ['1', '2'], 'abd': ['1']}

Two things I don't like about it:

  1. The fact that built-in reduce uses standard Python iteration protocol that kills the speed of NumPy-based data structures like DataFrame. I know that DataFrame.apply has a reduce mode, but it doesn't take a starting value like dict.
  2. (a minor drawback) The fact that I have to use indexing to get specific values from rows. I wish I could access specific fields in a row by name like in R, i.e. row$key instead of row[1][0]

Thank you in advance

like image 471
Eli Korvigo Avatar asked Oct 31 '22 21:10

Eli Korvigo


1 Answers

One option is to use groupby and apply to end with a pandas Series:

In [2]: df
Out[2]:
   key  value
0  abc      1
1  abc      2
2  abd      1

In [3]: df.groupby("key").value.apply(list)
Out[3]:
key
abc    [1, 2]
abd       [1]
Name: value, dtype: object

In [4]: _3.ix['abc']
Out[4]: [1, 2]
like image 61
Randy Avatar answered Nov 09 '22 13:11

Randy