I am using ipython notebook and following pandas cookbook examples release 0.16.0. I have troubles when I am on page 237. I made a dataframe like this
from pandas import *
data1=DataFrame({'AAA':[4,5,6,7],'BBB':[10,20,30,40],'CCC':[100,50,-30,-50]})
then, i did this, trying to change the index:
df=DataFrame(data=data1,index=(['a','b','c','d']))
but what i get is a dataframe with all values being NaN! Anyone knows why and how to fix it? I also tried to use set_index function, and it gave me errors.
Thank you very much!
If you want to change the index then either use reindex
or assign directly to the index:
In [5]:
data1=pd.DataFrame({'AAA':[4,5,6,7],'BBB':[10,20,30,40],'CCC':[100,50,-30,-50]})
print(data1)
df=pd.DataFrame(data=data1)
df.index = ['a','b','c','d']
df
AAA BBB CCC
0 4 10 100
1 5 20 50
2 6 30 -30
3 7 40 -50
Out[5]:
AAA BBB CCC
a 4 10 100
b 5 20 50
c 6 30 -30
d 7 40 -50
I don't know if it is a bug or not but if you did the following then it would work:
In [7]:
df=pd.DataFrame(data=data1.values,index=(['a','b','c','d']))
df
Out[7]:
0 1 2
a 4 10 100
b 5 20 50
c 6 30 -30
d 7 40 -50
So if you assigned the data to the values rather than the df itself then the df does not try to align to the passed in index
EDIT
After stepping through the code here, the issue is that it's using the passed index to reindex the df, we can reproduce this behaviour by doing the following:
In [46]:
data1 = pd.DataFrame({'AAA':[4,5,6,7],'BBB':[10,20,30,40],'CCC':[100,50,-30,-50]})
data1.reindex_axis(list('abcd'))
Out[46]:
AAA BBB CCC
a NaN NaN NaN
b NaN NaN NaN
c NaN NaN NaN
d NaN NaN NaN
This is because it enters the df constructor detects it is an instance of BlockManager
and tries to construct a df:
Stepping through the code I see that it reaches here in frame.py:
if isinstance(data, BlockManager):
mgr = self._init_mgr(data, axes=dict(index=index, columns=columns),
dtype=dtype, copy=copy)
and then ends up here in generic.py:
119 def _init_mgr(self, mgr, axes=None, dtype=None, copy=False):
120 """ passed a manager and a axes dict """
121 for a, axe in axes.items():
122 if axe is not None:
123 mgr = mgr.reindex_axis(
124 -> axe, axis=self._get_block_manager_axis(a), copy=False)
An issue has now been posted about this
Update this is expected behaviour, if you pass the index then it will use this index to reindex against the passed in df, from @Jeff
This is the defined behavior, to reindex the provided input to the passed index and/or columns .
See related Issue
EdChum is absolutely right with the suggestion to use reindex, but I think what's happening here is that when you use a DataFrame as the argument for the data parameter, it uses the whole existing DataFrame when creating the new DataFrame.
If you want to accomplish what you're getting at, you need to explicitly feed the DataFrame class that actual data (not the data wrapped up in another DataFrame). You do this by using data1.values. You also have to explicitly give the class the column names, too, so it all comes out like so:
In [1]: pd.DataFrame(data=data1.values,columns=data1.columns,index=(['a','b','c','d']))
Out[1]:
AAA BBB CCC
a 4 10 100
b 5 20 50
c 6 30 -30
d 7 40 -50
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With