Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

fill missing indices in pandas

Tags:

python

pandas

I have data like follows:

import pandas as pd
from datetime import datetime

x = pd.Series([1, 2, 4], [datetime(2013,11,1), datetime(2013,11, 2), datetime(2013, 11, 4)])

The missing index at November 3rd corresponds to a zero value, and I want it to look like this:

y = pd.Series([1,2,0,4], pd.date_range('2013-11-01', periods = 4))

What's the best way to convert x to y? I've tried

y = pd.Series(x, pd.date_range('2013-11-1', periods = 4)).fillna(0)

This throws an index error sometimes which I can't interpret (Index length did not match values, even though index and data have the same length. Is there a better way to do this?

like image 404
qua Avatar asked Dec 05 '13 05:12

qua


People also ask

How do you get indices in Pandas?

To get the index of a Pandas DataFrame, call DataFrame. index property. The DataFrame. index property returns an Index object representing the index of this DataFrame.

How do you fill missing values in Python with mode?

You can use central tendency measures such as mean, median or mode of the numeric feature column to replace or impute missing values. Pandas Dataframe method in Python such as fillna can be used to replace the missing values. Methods such as mean(), median() and mode() can be used on Dataframe for finding their values.

How do you change the indices of a data frame?

Use DataFrame.reset_index() function We can use DataFrame. reset_index() to reset the index of the updated DataFrame. By default, it adds the current row index as a new column called 'index' in DataFrame, and it will create a new row index as a range of numbers starting at 0.


2 Answers

I think I would use a resample (note if there are dupes it takes the mean by default):

In [11]: x.resample('D')  # you could use how='first'
Out[11]: 
2013-11-01     1
2013-11-02     2
2013-11-03   NaN
2013-11-04     4
Freq: D, dtype: float64

In [12]: x.resample('D').fillna(0)
Out[12]: 
2013-11-01    1
2013-11-02    2
2013-11-03    0
2013-11-04    4
Freq: D, dtype: float64

If you prefered dupes to raise, then use reindex:

In [13]: x.reindex(pd.date_range('2013-11-1', periods=4), fill_value=0)
Out[13]: 
2013-11-01   1
2013-11-02   2
2013-11-03   0
2013-11-04   4
Freq: D, dtype: float64
like image 130
Andy Hayden Avatar answered Oct 13 '22 22:10

Andy Hayden


You can use pandas.Series.resample() for this:

>>> x.resample('D').fillna(0)
2013-11-01    1
2013-11-02    2
2013-11-03    0
2013-11-04    4

There's fill_method parameter in the resample() function, but I don't know if it's possible to use it to replace NaN during resampling. But looks like you can use how method to take care of it, like:

>>> x.resample('D', how=lambda x: x.mean() if len(x) > 0 else 0)
2013-11-01    1
2013-11-02    2
2013-11-03    0
2013-11-04    4

Don't know which method is preferred one. Please also take a look at @AndyHayden's answer - probably reindex() with fill_value=0 would be most efficien way to do this, but you have to make your own tests.

like image 35
Roman Pekar Avatar answered Oct 13 '22 23:10

Roman Pekar