I have data like follows: <pre class="prettyprint"><code>import pandas as pd from datetime import datetime x = pd.Series([1, 2, 4], [datetime(2013,11,1), datetime(2013,11, 2), datetime(2013, 11, 4)]) </code></pre> The missing index at November 3rd corresponds to a zero value, and I want it to look like this: <pre class="prettyprint"><code>y = pd.Series([1,2,0,4], pd.date_range('2013-11-01', periods = 4)) </code></pre> What's the best way to convert x to y? I've tried <pre class="prettyprint"><code>y = pd.Series(x, pd.date_range('2013-11-1', periods = 4)).fillna(0) </code></pre> This throws an index error sometimes which I can't interpret (Index length did not match values, even though index and data have the same length. Is there a better way to do this?

You can use <code>pandas.Series.resample()</code> for this: <pre class="prettyprint"><code>>>> x.resample('D').fillna(0) 2013-11-01 1 2013-11-02 2 2013-11-03 0 2013-11-04 4 </code></pre> There's <code>fill_method</code> parameter in the <code>resample()</code> function, but I don't know if it's possible to use it to replace <code>NaN</code> during resampling. But looks like you can use <code>how</code> method to take care of it, like: <pre class="prettyprint"><code>>>> x.resample('D', how=lambda x: x.mean() if len(x) > 0 else 0) 2013-11-01 1 2013-11-02 2 2013-11-03 0 2013-11-04 4 </code></pre> Don't know which method is preferred one. Please also take a look at @AndyHayden's answer - probably <code>reindex()</code> with <code>fill_value=0</code> would be most efficien way to do this, but you have to make your own tests.

fill missing indices in pandas

Tags:

python

pandas

I have data like follows:

import pandas as pd
from datetime import datetime

x = pd.Series([1, 2, 4], [datetime(2013,11,1), datetime(2013,11, 2), datetime(2013, 11, 4)])

The missing index at November 3rd corresponds to a zero value, and I want it to look like this:

y = pd.Series([1,2,0,4], pd.date_range('2013-11-01', periods = 4))

What's the best way to convert x to y? I've tried

y = pd.Series(x, pd.date_range('2013-11-1', periods = 4)).fillna(0)

This throws an index error sometimes which I can't interpret (Index length did not match values, even though index and data have the same length. Is there a better way to do this?

404

asked Dec 05 '13 05:12

qua

2 Answers

I think I would use a resample (note if there are dupes it takes the mean by default):

In [11]: x.resample('D')  # you could use how='first'
Out[11]: 
2013-11-01     1
2013-11-02     2
2013-11-03   NaN
2013-11-04     4
Freq: D, dtype: float64

In [12]: x.resample('D').fillna(0)
Out[12]: 
2013-11-01    1
2013-11-02    2
2013-11-03    0
2013-11-04    4
Freq: D, dtype: float64

If you prefered dupes to raise, then use reindex:

In [13]: x.reindex(pd.date_range('2013-11-1', periods=4), fill_value=0)
Out[13]: 
2013-11-01   1
2013-11-02   2
2013-11-03   0
2013-11-04   4
Freq: D, dtype: float64

130

answered Oct 13 '22 22:10

Andy Hayden

You can use pandas.Series.resample() for this:

>>> x.resample('D').fillna(0)
2013-11-01    1
2013-11-02    2
2013-11-03    0
2013-11-04    4

There's fill_method parameter in the resample() function, but I don't know if it's possible to use it to replace NaN during resampling. But looks like you can use how method to take care of it, like:

>>> x.resample('D', how=lambda x: x.mean() if len(x) > 0 else 0)
2013-11-01    1
2013-11-02    2
2013-11-03    0
2013-11-04    4

Don't know which method is preferred one. Please also take a look at @AndyHayden's answer - probably reindex() with fill_value=0 would be most efficien way to do this, but you have to make your own tests.

answered Oct 13 '22 23:10

Roman Pekar

Related questions
                            
                                Python convert html to text and mimic formatting
                            
                                __setitem__ implementation in Python for Point(x,y) class
                            
                                Python changing class variables
                            
                                Python Socket Listening
                            
                                Strip an ordered sequence of characters from a string
                            
                                Is there a simple Python map-reduce framework that uses the regular filesystem?
                            
                                Python - Parsing JSON Data Set
                            
                                Updating a python dictionary while adding to existing keys?
                            
                                Pandas: Combine TimeGrouper with another Groupby argument
                            
                                Extracting data from HTML with Python
                            
                                SQLAlchemy Core Connection Context Manager
                            
                                xlwt limiting the number of rows
                            
                                Implementing Stack with Python
                            
                                Why use else in try/except construct in Python?
                            
                                Why is PyYAML spending so much time in just parsing a YAML File?
                            
                                numpy gradient function and numerical derivatives
                            
                                Python Firefox Webdriver tmp files
                            
                                python pandas: why map is faster?
                            
                                Python UTF-8 Lowercase Turkish Specific Letter
                            
                                Scrapy: Limit the number of request or request bytes

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With