Fill missing timeseries data using pandas or numpy

Question

I have a list of dictionaries which looks like this :

L=[
{
"timeline": "2014-10", 
"total_prescriptions": 17
}, 
{
"timeline": "2014-11", 
"total_prescriptions": 14
}, 
{
"timeline": "2014-12", 
"total_prescriptions": 8
},
{
"timeline": "2015-1", 
"total_prescriptions": 4
}, 
{
"timeline": "2015-3", 
"total_prescriptions": 10
}, 
{
"timeline": "2015-4", 
"total_prescriptions": 3
} 
]

This basically is the result of a SQL query which when given a start date and an end date gives the count of total prescriptions for each month starting from the start date till the end month.However,for months where the prescriptions count is 0(Feb 2015),it completely skips that month.Is it possible using pandas or numpy to alter this list so that it adds an entry for the missing month with 0 as the total prescription as follows:

[
{
"timeline": "2014-10", 
"total_prescriptions": 17
}, 
{
"timeline": "2014-11", 
"total_prescriptions": 14
}, 
{
"timeline": "2014-12", 
"total_prescriptions": 8
{
"timeline": "2015-1", 
"total_prescriptions": 4
}, 
{
"timeline": "2015-2",   # 2015-2 to be inserted for missing month
"total_prescriptions": 0 # 0 to be inserted for total prescription
}, 
{
"timeline": "2015-3", 
"total_prescriptions": 10
}, 
{
"timeline": "2015-4", 
"total_prescriptions": 3
} 
]

maxymoo · Accepted Answer

What you are talking about is called "Resampling" in Pandas; first convert the your time to a numpy datetime and set as your index:

df = pd.DataFrame(L)
df.index=pd.to_datetime(df.timeline,format='%Y-%m')
df
           timeline  total_prescriptions
timeline                                
2014-10-01  2014-10                   17
2014-11-01  2014-11                   14
2014-12-01  2014-12                    8
2015-01-01   2015-1                    4
2015-03-01   2015-3                   10
2015-04-01   2015-4                    3

Then you can add in your missing months with resample('MS') (MS stands for "month start" I guess), and use fillna(0) to convert null values to zero as in your requirement.

df = df.resample('MS').fillna(0)
df
            total_prescriptions
timeline                       
2014-10-01                   17
2014-11-01                   14
2014-12-01                    8
2015-01-01                    4
2015-02-01                  NaN
2015-03-01                   10
2015-04-01                    3

To convert back to your original format, convert the datetime index back to string using to_native_types, and then export using to_dict('records'):

df['timeline']=df.index.to_native_types()
df.to_dict('records')
[{'timeline': '2014-10-01', 'total_prescriptions': 17.0},
 {'timeline': '2014-11-01', 'total_prescriptions': 14.0},
 {'timeline': '2014-12-01', 'total_prescriptions': 8.0},
 {'timeline': '2015-01-01', 'total_prescriptions': 4.0},
 {'timeline': '2015-02-01', 'total_prescriptions': 0.0},
 {'timeline': '2015-03-01', 'total_prescriptions': 10.0},
 {'timeline': '2015-04-01', 'total_prescriptions': 3.0}]

Fill missing timeseries data using pandas or numpy

Tags:

python

dictionary

list

pandas

numpy

Amistad

1 Answers

maxymoo

Recent Activity

Donate For Us

Fill missing timeseries data using pandas or numpy

Tags:

python

dictionary

list

pandas

numpy

Amistad

1 Answers

maxymoo

Related questions

Recent Activity

Donate For Us