Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Pandas Sum Values in Columns If date between 2 dates

I have a dataframe df which can be created with this:

data={'id':[1,1,1,1,2,2,2,2],
      'date1':[datetime.date(2016,1,1),datetime.date(2016,1,2),datetime.date(2016,1,3),datetime.date(2016,1,4),
               datetime.date(2016,1,2),datetime.date(2016,1,4),datetime.date(2016,1,3),datetime.date(2016,1,1)],
      'date2':[datetime.date(2016,1,5),datetime.date(2016,1,3),datetime.date(2016,1,5),datetime.date(2016,1,5),
               datetime.date(2016,1,4),datetime.date(2016,1,5),datetime.date(2016,1,4),datetime.date(2016,1,1)],
      'score1':[5,7,3,2,9,3,8,3],
      'score2':[1,3,0,5,2,20,7,7]}
df=pd.DataFrame.from_dict(data)

And looks like this:
   id       date1       date2  score1  score2
0   1  2016-01-01  2016-01-05       5       1
1   1  2016-01-02  2016-01-03       7       3
2   1  2016-01-03  2016-01-05       3       0
3   1  2016-01-04  2016-01-05       2       5
4   2  2016-01-02  2016-01-04       9       2
5   2  2016-01-04  2016-01-05       3      20
6   2  2016-01-03  2016-01-04       8       7
7   2  2016-01-01  2016-01-01       3       7

What I need to do is create a column for each of score1 and score2, which creates two columns which SUM the values of score1 and score2 respectively, based on whether the usedate is between date1 and date2. usedate is created by getting all dates between and including the date1 minimum and the date2 maximum. I used this to create the date range:

drange=pd.date_range(df.date1.min(),df.date2.max())    

The resulting dataframe newdf should look like:

     usedate  score1sum  score2sum
0 2016-01-01          8          8
1 2016-01-02         21          6
2 2016-01-03         32         13
3 2016-01-04         30         35
4 2016-01-05         13         26

For clarification, on usedate 2016-01-01, score1sum is 8, which is calculated by looking at the rows in df where 2016-01-01 is between and including date1 and date2, which sum row0(5) and row8(3). On usedate 2016-01-04, score2sum is 35, which is calculated by looking at the rows in df where 2016-01-04 is between and including date1 and date2, which sum row0(1), row3(0), row4(5), row5(2), row6(20), row7(7).

Maybe some kind of groupby, or melt then groupby?

like image 800
clg4 Avatar asked Jan 04 '18 21:01

clg4


People also ask

How do I get the value between two dates in Python?

You can use pandas. Series. between() method to select DataFrame rows between two dates. This method returns a boolean vector representing whether series element lies in the specified range or not.

What does SUM () do in pandas?

The sum() method adds all values in each column and returns the sum for each column. By specifying the column axis ( axis='columns' ), the sum() method searches column-wise and returns the sum of each row.

How do you find the difference between two date columns in pandas?

We will take a dataframe and have two columns for the dates between which we want to get the difference. Use df. dates1-df. dates2 to find the difference between the two dates and then convert the result in the form of months.

How do I calculate the difference between two dates and time in pandas?

There are several ways to calculate the time difference between two dates in Python using Pandas. The first is to subtract one date from the other. This returns a timedelta such as 0 days 05:00:00 that tells us the number of days, hours, minutes, and seconds between the two dates.


1 Answers

You can use apply with lambda function:

df['date1'] = pd.to_datetime(df['date1'])

df['date2'] = pd.to_datetime(df['date2'])

df1 = pd.DataFrame(index=pd.date_range(df.date1.min(), df.date2.max()), columns = ['score1sum', 'score2sum'])

df1[['score1sum','score2sum']] = df1.apply(lambda x: df.loc[(df.date1 <= x.name) & 
                                                            (x.name <= df.date2),
                                                            ['score1','score2']].sum(), axis=1)

df1.rename_axis('usedate').reset_index()

Output:

     usedate  score1sum  score2sum
0 2016-01-01          8          8
1 2016-01-02         21          6
2 2016-01-03         32         13
3 2016-01-04         30         35
4 2016-01-05         13         26
like image 186
Scott Boston Avatar answered Oct 08 '22 19:10

Scott Boston