I have a list of dictionaries which looks like this :
L=[
{
"timeline": "2014-10",
"total_prescriptions": 17
},
{
"timeline": "2014-11",
"total_prescriptions": 14
},
{
"timeline": "2014-12",
"total_prescriptions": 8
},
{
"timeline": "2015-1",
"total_prescriptions": 4
},
{
"timeline": "2015-3",
"total_prescriptions": 10
},
{
"timeline": "2015-4",
"total_prescriptions": 3
}
]
This basically is the result of a SQL query which when given a start date and an end date gives the count of total prescriptions for each month starting from the start date till the end month.However,for months where the prescriptions count is 0(Feb 2015),it completely skips that month.Is it possible using pandas or numpy to alter this list so that it adds an entry for the missing month with 0 as the total prescription as follows:
[
{
"timeline": "2014-10",
"total_prescriptions": 17
},
{
"timeline": "2014-11",
"total_prescriptions": 14
},
{
"timeline": "2014-12",
"total_prescriptions": 8
{
"timeline": "2015-1",
"total_prescriptions": 4
},
{
"timeline": "2015-2", # 2015-2 to be inserted for missing month
"total_prescriptions": 0 # 0 to be inserted for total prescription
},
{
"timeline": "2015-3",
"total_prescriptions": 10
},
{
"timeline": "2015-4",
"total_prescriptions": 3
}
]
What you are talking about is called "Resampling" in Pandas; first convert the your time to a numpy datetime and set as your index:
df = pd.DataFrame(L)
df.index=pd.to_datetime(df.timeline,format='%Y-%m')
df
timeline total_prescriptions
timeline
2014-10-01 2014-10 17
2014-11-01 2014-11 14
2014-12-01 2014-12 8
2015-01-01 2015-1 4
2015-03-01 2015-3 10
2015-04-01 2015-4 3
Then you can add in your missing months with resample('MS')
(MS stands for "month start" I guess), and use fillna(0)
to convert null values to zero as in your requirement.
df = df.resample('MS').fillna(0)
df
total_prescriptions
timeline
2014-10-01 17
2014-11-01 14
2014-12-01 8
2015-01-01 4
2015-02-01 NaN
2015-03-01 10
2015-04-01 3
To convert back to your original format, convert the datetime index back to string using to_native_types
, and then export using to_dict('records')
:
df['timeline']=df.index.to_native_types()
df.to_dict('records')
[{'timeline': '2014-10-01', 'total_prescriptions': 17.0},
{'timeline': '2014-11-01', 'total_prescriptions': 14.0},
{'timeline': '2014-12-01', 'total_prescriptions': 8.0},
{'timeline': '2015-01-01', 'total_prescriptions': 4.0},
{'timeline': '2015-02-01', 'total_prescriptions': 0.0},
{'timeline': '2015-03-01', 'total_prescriptions': 10.0},
{'timeline': '2015-04-01', 'total_prescriptions': 3.0}]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With