Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I create pandas DataFrame (with index or multiindex) from list of namedtuple instances?

Tags:

python

pandas

Simple example:

>>> from collections import namedtuple
>>> import pandas

>>> Price = namedtuple('Price', 'ticker date price')
>>> a = Price('GE', '2010-01-01', 30.00)
>>> b = Price('GE', '2010-01-02', 31.00)
>>> l = [a, b]
>>> df = pandas.DataFrame.from_records(l, index='ticker')
Traceback (most recent call last)
...
KeyError: 'ticker'

Harder example:

>>> df2 = pandas.DataFrame.from_records(l, index=['ticker', 'date'])
>>> df2

         0           1   2
ticker  GE  2010-01-01  30
date    GE  2010-01-02  31

Now it thinks that ['ticker', 'date'] is the index itself, rather than the columns I want to use as the index.

Is there a way to do this without resorting to an intermediate numpy ndarray or using set_index after the fact?

like image 627
MikeRand Avatar asked Jun 08 '13 23:06

MikeRand


People also ask

Can we create DataFrame from list of dictionaries?

DataFrame is a two-dimensional pandas data structure, which is used to represent the tabular data in the rows and columns format. We can create a pandas DataFrame object by using the python list of dictionaries.

How do you create a data frame without an index?

In case if you wanted to write a pandas DataFrame to a CSV file without Index, use param index=False in to_csv() method. If you wanted to select some columns and ignore the index column.

How do I create a MultiIndex column in pandas?

pandas MultiIndex to ColumnsUse pandas DataFrame. reset_index() function to convert/transfer MultiIndex (multi-level index) indexes to columns. The default setting for the parameter is drop=False which will keep the index values as columns and set the new index to DataFrame starting from zero.


1 Answers

To get a Series from a namedtuple you could use the _fields attribute:

In [11]: pd.Series(a, a._fields)
Out[11]:
ticker            GE
date      2010-01-01
price             30
dtype: object

Similarly you can create a DataFrame like this:

In [12]: df = pd.DataFrame(l, columns=l[0]._fields)

In [13]: df
Out[13]:
  ticker        date  price
0     GE  2010-01-01     30
1     GE  2010-01-02     31

You have to set_index after the fact, but you can do this inplace:

In [14]: df.set_index(['ticker', 'date'], inplace=True)

In [15]: df
Out[15]:
                   price
ticker date
GE     2010-01-01     30
       2010-01-02     31
like image 136
Andy Hayden Avatar answered Oct 26 '22 18:10

Andy Hayden