Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas version 0.16.0 after changing dataframe index all values become NaN

I am using ipython notebook and following pandas cookbook examples release 0.16.0. I have troubles when I am on page 237. I made a dataframe like this

from pandas import *
data1=DataFrame({'AAA':[4,5,6,7],'BBB':[10,20,30,40],'CCC':[100,50,-30,-50]})

then, i did this, trying to change the index:

df=DataFrame(data=data1,index=(['a','b','c','d']))

but what i get is a dataframe with all values being NaN! Anyone knows why and how to fix it? I also tried to use set_index function, and it gave me errors.

Thank you very much! enter image description here

like image 979
Yiying Wang Avatar asked Apr 17 '15 18:04

Yiying Wang


2 Answers

If you want to change the index then either use reindex or assign directly to the index:

In [5]:

data1=pd.DataFrame({'AAA':[4,5,6,7],'BBB':[10,20,30,40],'CCC':[100,50,-30,-50]})
print(data1)
df=pd.DataFrame(data=data1)
df.index = ['a','b','c','d']
df
   AAA  BBB  CCC
0    4   10  100
1    5   20   50
2    6   30  -30
3    7   40  -50
Out[5]:
   AAA  BBB  CCC
a    4   10  100
b    5   20   50
c    6   30  -30
d    7   40  -50

I don't know if it is a bug or not but if you did the following then it would work:

In [7]:

df=pd.DataFrame(data=data1.values,index=(['a','b','c','d']))
df
Out[7]:
   0   1    2
a  4  10  100
b  5  20   50
c  6  30  -30
d  7  40  -50

So if you assigned the data to the values rather than the df itself then the df does not try to align to the passed in index

EDIT

After stepping through the code here, the issue is that it's using the passed index to reindex the df, we can reproduce this behaviour by doing the following:

In [46]:

data1 = pd.DataFrame({'AAA':[4,5,6,7],'BBB':[10,20,30,40],'CCC':[100,50,-30,-50]})
data1.reindex_axis(list('abcd'))
Out[46]:
   AAA  BBB  CCC
a  NaN  NaN  NaN
b  NaN  NaN  NaN
c  NaN  NaN  NaN
d  NaN  NaN  NaN

This is because it enters the df constructor detects it is an instance of BlockManager and tries to construct a df:

Stepping through the code I see that it reaches here in frame.py:

        if isinstance(data, BlockManager):
        mgr = self._init_mgr(data, axes=dict(index=index, columns=columns),
                             dtype=dtype, copy=copy)

and then ends up here in generic.py:

119         def _init_mgr(self, mgr, axes=None, dtype=None, copy=False):
120             """ passed a manager and a axes dict """
121             for a, axe in axes.items():
122                 if axe is not None:
123                     mgr = mgr.reindex_axis(
124  ->                     axe, axis=self._get_block_manager_axis(a), copy=False)

An issue has now been posted about this

Update this is expected behaviour, if you pass the index then it will use this index to reindex against the passed in df, from @Jeff

This is the defined behavior, to reindex the provided input to the passed index and/or columns .

See related Issue

like image 177
EdChum Avatar answered Sep 28 '22 18:09

EdChum


EdChum is absolutely right with the suggestion to use reindex, but I think what's happening here is that when you use a DataFrame as the argument for the data parameter, it uses the whole existing DataFrame when creating the new DataFrame.

If you want to accomplish what you're getting at, you need to explicitly feed the DataFrame class that actual data (not the data wrapped up in another DataFrame). You do this by using data1.values. You also have to explicitly give the class the column names, too, so it all comes out like so:

In [1]: pd.DataFrame(data=data1.values,columns=data1.columns,index=(['a','b','c','d']))

Out[1]: 
   AAA  BBB  CCC
a    4   10  100
b    5   20   50
c    6   30  -30
d    7   40  -50
like image 26
andrewgcross Avatar answered Sep 28 '22 19:09

andrewgcross