I have data like follows:
import pandas as pd
from datetime import datetime
x = pd.Series([1, 2, 4], [datetime(2013,11,1), datetime(2013,11, 2), datetime(2013, 11, 4)])
The missing index at November 3rd corresponds to a zero value, and I want it to look like this:
y = pd.Series([1,2,0,4], pd.date_range('2013-11-01', periods = 4))
What's the best way to convert x to y? I've tried
y = pd.Series(x, pd.date_range('2013-11-1', periods = 4)).fillna(0)
This throws an index error sometimes which I can't interpret (Index length did not match values, even though index and data have the same length. Is there a better way to do this?
To get the index of a Pandas DataFrame, call DataFrame. index property. The DataFrame. index property returns an Index object representing the index of this DataFrame.
You can use central tendency measures such as mean, median or mode of the numeric feature column to replace or impute missing values. Pandas Dataframe method in Python such as fillna can be used to replace the missing values. Methods such as mean(), median() and mode() can be used on Dataframe for finding their values.
Use DataFrame.reset_index() function We can use DataFrame. reset_index() to reset the index of the updated DataFrame. By default, it adds the current row index as a new column called 'index' in DataFrame, and it will create a new row index as a range of numbers starting at 0.
I think I would use a resample (note if there are dupes it takes the mean by default):
In [11]: x.resample('D') # you could use how='first'
Out[11]:
2013-11-01 1
2013-11-02 2
2013-11-03 NaN
2013-11-04 4
Freq: D, dtype: float64
In [12]: x.resample('D').fillna(0)
Out[12]:
2013-11-01 1
2013-11-02 2
2013-11-03 0
2013-11-04 4
Freq: D, dtype: float64
If you prefered dupes to raise, then use reindex:
In [13]: x.reindex(pd.date_range('2013-11-1', periods=4), fill_value=0)
Out[13]:
2013-11-01 1
2013-11-02 2
2013-11-03 0
2013-11-04 4
Freq: D, dtype: float64
You can use pandas.Series.resample()
for this:
>>> x.resample('D').fillna(0)
2013-11-01 1
2013-11-02 2
2013-11-03 0
2013-11-04 4
There's fill_method
parameter in the resample()
function, but I don't know if it's possible to use it to replace NaN
during resampling. But looks like you can use how
method to take care of it, like:
>>> x.resample('D', how=lambda x: x.mean() if len(x) > 0 else 0)
2013-11-01 1
2013-11-02 2
2013-11-03 0
2013-11-04 4
Don't know which method is preferred one. Please also take a look at @AndyHayden's answer - probably reindex()
with fill_value=0
would be most efficien way to do this, but you have to make your own tests.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With