Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

reindexing dataframes replaces all my data with NaNs, why?

So I was investigating how some commands from Pandas work, and I ran into this issue; when I use the reindex command, my data is replaced by NaN values. Below is my code:

>>>import pandas as pd

>>>import numpy as np

>>>frame1=pd.DataFrame(np.arange(365))

then, I give it an index of dates:

>>>frame1.index=pd.date_range(pd.datetime(2017, 4, 6), pd.datetime(2018, 4, 5))

then I reindex:

>>>broken_frame=frame1.reindex(np.arange(365))

aaaand all my values are erased. This example isn't particularly useful, but it happens any and every time I use the reindex command, seemingly regardless of context. Similarly, when I try to join two dataframes:

>>>big_frame=frame1.join(pd.DataFrame(np.arange(365)), lsuffix='_frame1')

all of the values in the frame being attached (np.arange(365)) are replaced with NaNs before the frames are joined. If I had to guess, I would say this is because the second frame is reindexed as part of the joining process, and reindexing erases my values.

What's going on here?

like image 909
Jacob Avatar asked Sep 10 '25 17:09

Jacob


2 Answers

From the Docs

Conform DataFrame to new index with optional filling logic, placing NA/NaN in locations having no value in the previous index. A new object is produced unless the new index is equivalent to the current one and copy=False

Emphasis my own.

You want either set_index

frame1.set_index(np.arange(365))

Or do what you did in the first place

frame1.index = np.arange(365)
like image 99
piRSquared Avatar answered Sep 13 '25 10:09

piRSquared


I did not find the answer helpful in relation to what I think the question is getting at so I am adding to this.

The key is that the initial dataframe must have the same index that you are reindexing on for this to work. Even the names must be the same! So if you're new MultiIndex has no names, your initial dataframe must also have no names.

m = pd.MultiIndex.from_product([df['col'].unique(), 
                                pd.date_range(df.date.min(), 
                                              df.date.max() + 
                                              pd.offsets.MonthEnd(1), 
                                              freq='M')])
df = df.set_index(['col','date']).rename_axis([None,None])
df.reindex(m)

Then you will preserve your initial data values and reindex the dataframe.

like image 41
conv3d Avatar answered Sep 13 '25 10:09

conv3d