Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Calculate MRR in Python Pandas dataframe

I have a Pandas dataframe with the following columns

date | months | price

I calculate some basic BI metrics. I did the Net Revenue by grouping the dataframe on date and sum the price:

df = df[["Date", "Price"]].groupby(df['Date'])["Price"].sum().reset_index()

Now, I want to find the MRR, which is similar to the Net Revenue, but in case the column months have more than 1 month, the price should be "moved" equally to the next months. And also, it is grouped by month and not day.

For example, if I am on January 2016 and I have a row with 3 months and price 30$, I should add 10$ to January, 10$ to February and 10$ to March.

My first idea was to iterate through the dataframe, keep track of the months and the amount of price I should "move" on next months and create a new dataframe manually.

But, first, is there any Pythonic way in Pandas to do it?

Data to reproduce a dataframe:

import pandas as pd
df = pd.DataFrame({'date': ['01-01-2016', '05-01-2016', '10-01-2016','04-02-2016'], 
                   'months': [1, 3, 1, 6],
                   'price': [40, 60, 20, 60]})

Desired result:

Date         | MRR
January 2016 | 80
February 2016| 30
March 2016   | 10
April 2016   | 10
May 2016     | 10
June 2016    | 10
July 2016    | 10

And the results calculated like this for each row

January 2016 = 40 + 20 + 20 + 0
February 2016 = 0 + 20 + 0 + 10
March 2016 = 0 + 0 + 0 + 10
April 2016 = 0 + 0 + 0 + 10
May 2016 = 0 + 0 + 0 + 10
June 2016 = 0 + 0 + 0 + 10
July 2016 = 0 + 0 + 0 + 10
like image 703
Tasos Avatar asked Apr 09 '26 11:04

Tasos


1 Answers

I don't know any way around using a loop. However, I can suggest a way to make the code pretty clean and efficient.

First, let's load the example data you supplied in the question text:

df = pd.DataFrame({'date': ['01-01-2016', '05-01-2016', '10-01-2016','04-02-2016'], 
                   'months': [1, 3, 1, 6],
                   'price': [40, 60, 20, 60]})

In order to use Panda's date functionality (e.g. grouping by month), we will use the date column as index. A DateTimeIndex in fact:

df['date'] = pd.to_datetime(df['date'], format='%d-%m-%Y')
df = df.set_index('date')

Now, it's really easy to, for example, view a month-by-month summary, by using the resample function that works like the groupby function you already know, but uses time-periods:

df.resample('M').sum()

Now to "spread out" rows where the months column is > 1 over multiple months. My approach here is to generate a new DataFrame for each row:

dfs = []
for date, values in df.iterrows():
    months, price = values
    dfs.append(
        pd.DataFrame(
            # Compute the price for each month, and repeat this value
            data={'price': [price / months] * months},
            # The index is a date range for the requested number of months
            index=pd.date_range(date, periods=months, freq='M')
        )
    )

Now we can just concatenate the list of DataFrames, resample to months and take the sum:

pd.concat(dfs).resample('M').sum()

Output:

            price
2016-01-31     80
2016-02-29     30
2016-03-31     30
2016-04-30     10
2016-05-31     10
2016-06-30     10
2016-07-31     10

See http://pandas.pydata.org/pandas-docs/stable/timeseries.html for all the cool things Panda's can do regarding time. For example, to exactly produce your desired output you could do this:

output.index = output.index.strftime('%B %Y')

Which results in this:

               price
January 2016      80
February 2016     30
March 2016        30
April 2016        10
May 2016          10
June 2016         10
July 2016         10
like image 160
Marijn van Vliet Avatar answered Apr 11 '26 01:04

Marijn van Vliet



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!