Multiplying a pandas column by a yearly coefficient in a fast way

Question

I have a dataframe with datetime index:

df = pd.DataFrame(
    {'test':[1, 1, 1, 1, 1, 1]},
    index=[
        '2018-01-01', '2018-01-02', '2018-01-03',
        '2019-01-03', '2019-01-02', '2020-01-02'
    ]
 )
df.index=  pd.to_datetime(df.index)

I have a yearly parameter:

yearly_parameter = [1, 2, 3]

I would like to multiply efficiently (in a vectorized way?) the column 'test' by it's corresponding yearly parameter contained in the list yearly_parameter (first value is for 2018, second for 2019 and third for 2020). How can I do that efficiently? Is a list a good way to store those yearly parameters to do the calculation?

I expect the following result in a column, say 'answer':

df['answer'] = [1, 1, 1, 2, 2, 3]

print(df)

              test  answer
2018-01-01     1       1
2018-01-02     1       1
2018-01-03     1       1
2019-01-03     1       2
2019-01-02     1       2
2020-01-02     1       3

Thank you very much for your help,

Pierre

piRSquared · Accepted Answer

`pd.factorize`

Use factorize to establish an ordering of years that should correspond to elements in yearly_parameter. Then we can efficiently multiply with array slicing.

This expects that the length of yearly_parameter be at least as long as the number of unique years in df.index

f, y = pd.factorize(df.index.year)

yearly_parameter = np.array([1, 2, 3])

df.assign(answer=df.test.values * yearly_parameter[f])

            test  answer
2018-01-01     1       1
2018-01-02     1       1
2018-01-03     1       1
2019-01-03     1       2
2019-01-02     1       2
2020-01-02     1       3

`np.unique`

Mind that this assumes that yearly_parameter aligns its first element with the first year that is observed. If you intend for the first element to correspond to the minimum year observed then you should use pd.factorize(df.index.year, sort=True). Or better yet, if you are going to sort then use an equivalent calculation in Numpy

y, f = np.unique(df.index.year, return_inverse=True)

yearly_parameter = np.array([1, 2, 3])

df.assign(answer=df.test.values * yearly_parameter[f])

            test  answer
2018-01-01     1       1
2018-01-02     1       1
2018-01-03     1       1
2019-01-03     1       2
2019-01-02     1       2
2020-01-02     1       3

Multiplying a pandas column by a yearly coefficient in a fast way

Tags:

performance

python

pandas

multiplication

Peslier53

1 Answers

`pd.factorize`

`np.unique`

piRSquared

Recent Activity

Donate For Us

Multiplying a pandas column by a yearly coefficient in a fast way

Tags:

performance

python

pandas

multiplication

Peslier53

1 Answers

pd.factorize

np.unique

piRSquared

Related questions

Recent Activity

Donate For Us

`pd.factorize`

`np.unique`