Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas backward fill increment by 12 months

I have a dataframe with course names for each year. I need to find the duration in months starting from year 2016.

from io import StringIO

import pandas as pd

u_cols = ['page_id','web_id']
audit_trail = StringIO('''
year_id | web_id
2012|efg
2013|abc 
2014| xyz
2015| pqr
2016| mnp
''')

df11 = pd.read_csv(audit_trail, sep="|", names = u_cols  )

How do I add months in a new column starting from highest (i.e. bottom like bfill?)

The final data-frame will look like this...

u_cols = ['page_id','web_id' , 'months']
audit_trail = StringIO('''
year_id | web_id | months
2012|efg | 60
2013|abc | 48
2014| xyz | 36
2015| pqr | 24
2016| mnp | 12
''')

df12 = pd.read_csv(audit_trail, sep="|", names = u_cols  )

Some of the answers do not consider that there can be multiple courses. Updating sample data...

from io import StringIO

import pandas as pd

u_cols = ['course_name','page_id','web_id']
audit_trail = StringIO('''
course_name| year_id | web_id
a|2012|efg
a|2013|abc 
a|2014| xyz
a|2015| pqr
a|2016| mnp
b|2014| xyz
b|2015| pqr
b|2016| mnp

''')

df11 = pd.read_csv(audit_trail, sep="|", names = u_cols  )
like image 777
shantanuo Avatar asked Aug 18 '17 06:08

shantanuo


3 Answers

df11.assign(
    months=df11.groupby('course_name').apply(
        lambda x: pd.Series(np.repeat([12], len(x)).cumsum()[::-1])
    ).values
)

  course_name  year_id web_id  months
0           a     2012    efg      60
1           a     2013    abc      48
2           a     2014    xyz      36
3           a     2015    pqr      24
4           a     2016    mnp      12
5           b     2014    xyz      36
6           b     2015    pqr      24
7           b     2016    mnp      12

All Credit to @Alexander and @jezrael for reminding us of a cool characteristic of transform
Considering that, I can change my answer to

df11.assign(months=df11.groupby('course_name').year_id.transform(
    lambda x: np.repeat([12], len(x)).cumsum()[::-1]
))

  course_name  year_id web_id  months
0           a     2012    efg      60
1           a     2013    abc      48
2           a     2014    xyz      36
3           a     2015    pqr      24
4           a     2016    mnp      12
5           b     2014    xyz      36
6           b     2015    pqr      24
7           b     2016    mnp      12
like image 150
piRSquared Avatar answered Nov 08 '22 06:11

piRSquared


>>> df11.assign(months=df11.groupby('course_name').year_id.transform(
        lambda years: range(len(years) * 12, 0, -12)))
  course_name  year_id web_id  months
0           a     2012    efg      60
1           a     2013   abc       48
2           a     2014    xyz      36
3           a     2015    pqr      24
4           a     2016    mnp      12
5           b     2014    xyz      36
6           b     2015    pqr      24
7           b     2016    mnp      12
like image 5
Alexander Avatar answered Nov 08 '22 08:11

Alexander


You can use transform with arange:

df11['months'] = df11.groupby('course_name')['year_id'] \
                     .transform(lambda x: np.arange(len(x)*12, 0, -12))
print (df11)
  course_name  year_id  web_id  months
0           a     2012     efg      60
1           a     2013     abc      48
2           a     2014     xyz      36
3           a     2015     pqr      24
4           a     2016     mnp      12
5           b     2014     xyz      36
6           b     2015     pqr      24
7           b     2016     mnp      12
like image 4
jezrael Avatar answered Nov 08 '22 06:11

jezrael