I have a 'Posting Date' column in the dataframe in the format of '2017-03-01'. The type is <datetime64>[ns]. And I want to change the value if it is after '2017-03-31' to '2017-03-31', and all others remain unchanged.
When I type df['Posting Date']>'2017-03-31',it can correctly show me all the rows where the condition is met. So I guess the date filtering function works.
However, when I used numpy.where to write the condition as this:
df['Posting Date'] = np.where(df['Posting Date']>'2017-03-31','2017-03-31,'df['Posting Date'])
it incurrs an invalid type promotion error.
I also tried df.loc and the same error occers.
df.loc[df['Posting Date']>'2017-03-31','Posting Date']='2017-03-31'
ValueError: invalid literal for int() with base 10: '2017-03-31'
I'm wondering why the error occurs. How can I replace date correctly? Whatever method which works is fine.
Its because of are trying to replace datetime with string in datetime dtype column so pass a datetime in np.where i.e
df['Posting Date'] = np.where(df['Posting Date']>'2017-03-31',pd.to_datetime(['2017-03-31']),df['Posting Date'])
Example output :
df = pd.DataFrame({'Posting Date': pd.to_datetime(['20-4-2017','20-4-2017','20-4-2017','20-3-2017','20-2-2017'])})
df['Posting Date'] = np.where(df['Posting Date']>'2017-03-31',pd.to_datetime(['2017-03-31']),df['Posting Date'])
Output :
Posting Date 0 2017-03-31 1 2017-03-31 2 2017-03-31 3 2017-03-20 4 2017-02-20
Better one posted by @pirSquared in comment using clip i.e
df['Posting Date'] = df['Posting Date'].clip(upper=pd.Timestamp('2017-03-31'))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With