Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Multiplying a pandas column by a yearly coefficient in a fast way

I have a dataframe with datetime index:

df = pd.DataFrame(
    {'test':[1, 1, 1, 1, 1, 1]},
    index=[
        '2018-01-01', '2018-01-02', '2018-01-03',
        '2019-01-03', '2019-01-02', '2020-01-02'
    ]
 )
df.index=  pd.to_datetime(df.index)

I have a yearly parameter:

yearly_parameter = [1, 2, 3]

I would like to multiply efficiently (in a vectorized way?) the column 'test' by it's corresponding yearly parameter contained in the list yearly_parameter (first value is for 2018, second for 2019 and third for 2020). How can I do that efficiently? Is a list a good way to store those yearly parameters to do the calculation?

I expect the following result in a column, say 'answer':

df['answer'] = [1, 1, 1, 2, 2, 3]

print(df)

              test  answer
2018-01-01     1       1
2018-01-02     1       1
2018-01-03     1       1
2019-01-03     1       2
2019-01-02     1       2
2020-01-02     1       3

Thank you very much for your help,

Pierre

like image 678
Peslier53 Avatar asked Oct 12 '25 11:10

Peslier53


1 Answers

pd.factorize

Use factorize to establish an ordering of years that should correspond to elements in yearly_parameter. Then we can efficiently multiply with array slicing.

This expects that the length of yearly_parameter be at least as long as the number of unique years in df.index

f, y = pd.factorize(df.index.year)

yearly_parameter = np.array([1, 2, 3])

df.assign(answer=df.test.values * yearly_parameter[f])

            test  answer
2018-01-01     1       1
2018-01-02     1       1
2018-01-03     1       1
2019-01-03     1       2
2019-01-02     1       2
2020-01-02     1       3

np.unique

Mind that this assumes that yearly_parameter aligns its first element with the first year that is observed. If you intend for the first element to correspond to the minimum year observed then you should use pd.factorize(df.index.year, sort=True). Or better yet, if you are going to sort then use an equivalent calculation in Numpy

y, f = np.unique(df.index.year, return_inverse=True)

yearly_parameter = np.array([1, 2, 3])

df.assign(answer=df.test.values * yearly_parameter[f])

            test  answer
2018-01-01     1       1
2018-01-02     1       1
2018-01-03     1       1
2019-01-03     1       2
2019-01-02     1       2
2020-01-02     1       3
like image 167
piRSquared Avatar answered Oct 15 '25 06:10

piRSquared