I have a df, where the data looks like this:
Time Value
60.8
Jul 2019 58.1
58.8
56.9
Oct 2019 51.8
54.6
56.8
Jan 2020 58.8
54.2
51.3
Apr 2020 52.2
I want to fill in the blank cells in the Time
variable according to the calendar year. So:
Time Value
Jun 2019 60.8
Jul 2019 58.1
Aug 2019 58.8
Sep 2019 56.9
Oct 2019 51.8
Nov 2019 54.6
Dec 2019 56.8
Jan 2020 58.8
Feb 2020 54.2
Mar 2020 51.3
Apr 2020 52.2
I saw a post where pandas could be used to fill in numeric values, but since my variable isn't necessarily defined in a numeric way, I'm not entirely sure how to apply it in this situation.
There seem to me to be two ways of approaching this: 1) modifying the list before writing to df. 2) Modifying the df.
I prefer the first solution, but not sure if it is possible.
Thanks.
My script:
totalmonth=['', 'Jul 2019', '', '', 'Oct 2019', '', '', 'Jan 2020', '', '', 'Apr 2020', '']
totalvalue=['60.8', '58.1', '58.8', '56.9', '51.8', '54.6', '56.8', '58.8', '54.2', '51.3', '52.2', '48.7']
df = pd.DataFrame({'Time': totalmonth,
'Value': totalvalue})
Ok, this took me longer than I would like to admit. I solved for your first answer
Output:
***********************BEFORE********************************
['', 'Jul 2019', '', '', 'Oct 2019', '', '', 'Jan 2020', '', '', 'Apr 2020', '']
Time Value
0 60.8
1 Jul 2019 58.1
2 58.8
3 56.9
4 Oct 2019 51.8
5 54.6
6 56.8
7 Jan 2020 58.8
8 54.2
9 51.3
10 Apr 2020 52.2
11 48.7
***********************AFTER********************************
['Jun 2019', 'Jul 2019', 'Aug 2019', 'Sep 2019', 'Oct 2019', 'Nov 2019', 'Dec 2019', 'Jan 2020', 'Feb 2020', 'Mar 2020', 'Apr 2020', 'May 2020']
Time Value
0 Jun 2019 60.8
1 Jul 2019 58.1
2 Aug 2019 58.8
3 Sep 2019 56.9
4 Oct 2019 51.8
5 Nov 2019 54.6
6 Dec 2019 56.8
7 Jan 2020 58.8
8 Feb 2020 54.2
9 Mar 2020 51.3
10 Apr 2020 52.2
11 May 2020 48.7
Code:
from datetime import datetime
from dateutil.relativedelta import relativedelta
totalmonth=['', 'Jul 2019', '', '', 'Oct 2019', '', '', 'Jan 2020', '', '', 'Apr 2020', '']
new_totalmonth = [datetime.strptime(x,'%b %Y') for x in totalmonth if x != '' ]
index = totalmonth.index(min(new_totalmonth).strftime('%b %Y'))
new_totalmonth = [(min(new_totalmonth) + relativedelta(months=x)).strftime('%b %Y') for x in range(-index,len(totalmonth) - index)]
print(new_totalmonth)
Breakdown
This line of code creates a list of all the valid dates and puts them in a format that I can run the min() function on.
new_totalmonth = [datetime.strptime(x,'%b %Y') for x in totalmonth if x != '' ]
What this prints out
print(new_totalmonth)
[datetime.datetime(2019, 7, 1, 0, 0), datetime.datetime(2019, 10, 1, 0, 0), datetime.datetime(2020, 1, 1, 0, 0), datetime.datetime(2020, 4, 1, 0, 0)]
This is creating the variable index and assigning it the index of the minimum date in totalmonth
index = totalmonth.index(min(new_totalmonth).strftime('%b %Y'))
min(new_totalmonth) # this is finding the minimum date in new_totalmonth
print(min(new_totalmonth))
2019-07-01 00:00:00
min(new_totalmonth).strftime('%b %Y') # This is putting that minimum in a format that matches what is in totalmonth so the function totalmonth.index() can get the correct index
print(min(new_totalmonth).strftime('%b %Y'))
Jul 2019
This is using list comprehension.
new_totalmonth = [(min(new_totalmonth) + relativedelta(months=x)).strftime('%b %Y') for x in range(-index,len(totalmonth) - index)]
I am using the index of the minimum date in totalmonth to manipulate the range of values (how many months) I am going to add to the minimum month in totalmonth
range(-index,len(totalmonth) - index)
print(list(range(-index,len(totalmonth) - index)))
[-1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
Since the minimum month (Jul 2019) is at index 1 I need to add -1 months to it to get the month that comes before it which is Jun 2019
So it can be broken out to:
(min(new_totalmonth) + relativedelta(months=-1)).strftime('%b %Y') = Jun 2019
(min(new_totalmonth) + relativedelta(months=0)).strftime('%b %Y') = Ju1 2019
(min(new_totalmonth) + relativedelta(months=1)).strftime('%b %Y') = Aug 2019
...
(min(new_totalmonth) + relativedelta(months=10)).strftime('%b %Y') = May 2019
Take all those values and put them in the list new_totalmonth
print(new_totalmonth)
['Jun 2019', 'Jul 2019', 'Aug 2019', 'Sep 2019', 'Oct 2019', 'Nov 2019', 'Dec 2019', 'Jan 2020', 'Feb 2020', 'Mar 2020', 'Apr 2020', 'May 2020']
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With