Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to fill in blank cells in df based on string in sequential row, Pandas

Tags:

python

pandas

I have a df, where the data looks like this:

  Time      Value
            60.8
  Jul 2019  58.1
            58.8
            56.9
  Oct 2019  51.8
            54.6
            56.8
  Jan 2020  58.8
            54.2
            51.3
  Apr 2020  52.2

I want to fill in the blank cells in the Time variable according to the calendar year. So:

  Time      Value
  Jun 2019  60.8
  Jul 2019  58.1
  Aug 2019  58.8
  Sep 2019  56.9
  Oct 2019  51.8
  Nov 2019  54.6
  Dec 2019  56.8
  Jan 2020  58.8
  Feb 2020  54.2
  Mar 2020  51.3
  Apr 2020  52.2

I saw a post where pandas could be used to fill in numeric values, but since my variable isn't necessarily defined in a numeric way, I'm not entirely sure how to apply it in this situation.

There seem to me to be two ways of approaching this: 1) modifying the list before writing to df. 2) Modifying the df.

I prefer the first solution, but not sure if it is possible.

Thanks.

My script:

totalmonth=['', 'Jul 2019', '', '', 'Oct 2019', '', '', 'Jan 2020', '', '', 'Apr 2020', '']
totalvalue=['60.8', '58.1', '58.8', '56.9', '51.8', '54.6', '56.8', '58.8', '54.2', '51.3', '52.2', '48.7']

df = pd.DataFrame({'Time': totalmonth,
                   'Value': totalvalue})
like image 711
Yu Na Avatar asked Oct 15 '22 02:10

Yu Na


1 Answers

Ok, this took me longer than I would like to admit. I solved for your first answer

Output:

***********************BEFORE********************************

['', 'Jul 2019', '', '', 'Oct 2019', '', '', 'Jan 2020', '', '', 'Apr 2020', '']
        Time Value
0             60.8
1   Jul 2019  58.1
2             58.8
3             56.9
4   Oct 2019  51.8
5             54.6
6             56.8
7   Jan 2020  58.8
8             54.2
9             51.3
10  Apr 2020  52.2
11            48.7

***********************AFTER********************************

['Jun 2019', 'Jul 2019', 'Aug 2019', 'Sep 2019', 'Oct 2019', 'Nov 2019', 'Dec 2019', 'Jan 2020', 'Feb 2020', 'Mar 2020', 'Apr 2020', 'May 2020']
        Time Value
0   Jun 2019  60.8
1   Jul 2019  58.1
2   Aug 2019  58.8
3   Sep 2019  56.9
4   Oct 2019  51.8
5   Nov 2019  54.6
6   Dec 2019  56.8
7   Jan 2020  58.8
8   Feb 2020  54.2
9   Mar 2020  51.3
10  Apr 2020  52.2
11  May 2020  48.7

Code:

from datetime import datetime
from dateutil.relativedelta import relativedelta
totalmonth=['', 'Jul 2019', '', '', 'Oct 2019', '', '', 'Jan 2020', '', '', 'Apr 2020', '']

new_totalmonth = [datetime.strptime(x,'%b %Y') for x in totalmonth if x != '' ]
index = totalmonth.index(min(new_totalmonth).strftime('%b %Y'))
new_totalmonth = [(min(new_totalmonth) + relativedelta(months=x)).strftime('%b %Y') for x in range(-index,len(totalmonth) - index)]
print(new_totalmonth)

Breakdown

This line of code creates a list of all the valid dates and puts them in a format that I can run the min() function on.

new_totalmonth = [datetime.strptime(x,'%b %Y') for x in totalmonth if x != '' ]

What this prints out

print(new_totalmonth)

[datetime.datetime(2019, 7, 1, 0, 0), datetime.datetime(2019, 10, 1, 0, 0), datetime.datetime(2020, 1, 1, 0, 0), datetime.datetime(2020, 4, 1, 0, 0)]

This is creating the variable index and assigning it the index of the minimum date in totalmonth

index = totalmonth.index(min(new_totalmonth).strftime('%b %Y'))

min(new_totalmonth)  # this is finding the minimum date in new_totalmonth
print(min(new_totalmonth))

2019-07-01 00:00:00
min(new_totalmonth).strftime('%b %Y')  # This is putting that minimum in a format that matches what is in totalmonth so the function totalmonth.index() can get the correct index

print(min(new_totalmonth).strftime('%b %Y'))

Jul 2019

This is using list comprehension.

new_totalmonth = [(min(new_totalmonth) + relativedelta(months=x)).strftime('%b %Y') for x in range(-index,len(totalmonth) - index)]

I am using the index of the minimum date in totalmonth to manipulate the range of values (how many months) I am going to add to the minimum month in totalmonth

range(-index,len(totalmonth) - index)

print(list(range(-index,len(totalmonth) - index)))
[-1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

Since the minimum month (Jul 2019) is at index 1 I need to add -1 months to it to get the month that comes before it which is Jun 2019

So it can be broken out to:

(min(new_totalmonth) + relativedelta(months=-1)).strftime('%b %Y') = Jun 2019
(min(new_totalmonth) + relativedelta(months=0)).strftime('%b %Y') = Ju1 2019
(min(new_totalmonth) + relativedelta(months=1)).strftime('%b %Y') = Aug 2019
...
(min(new_totalmonth) + relativedelta(months=10)).strftime('%b %Y') = May 2019

Take all those values and put them in the list new_totalmonth

print(new_totalmonth)
['Jun 2019', 'Jul 2019', 'Aug 2019', 'Sep 2019', 'Oct 2019', 'Nov 2019', 'Dec 2019', 'Jan 2020', 'Feb 2020', 'Mar 2020', 'Apr 2020', 'May 2020']
like image 159
marphlap Avatar answered Nov 01 '22 10:11

marphlap