In Python I have an array of dates generated (or read from a CSV-file) using pandas, and I want to add one year to each date. I can get it working using pandas but not using numpy. What am I doing wrong? Or is it a bug in either pandas or numpy?
Thanks!
import numpy as np
import pandas as pd
from pandas.tseries.offsets import DateOffset
# Generate range of dates using pandas.
dates = pd.date_range('1980-01-01', '2015-01-01')
# Add one year using pandas.
dates2 = dates + DateOffset(years=1)
# Convert result to numpy. THIS WORKS!
dates2_np = dates2.values
# Convert original dates to numpy array.
dates_np = dates.values
# Add one year using numpy. THIS FAILS!
dates3 = dates_np + np.timedelta64(1, 'Y')
# TypeError: Cannot get a common metadata divisor for NumPy datetime metadata [ns] and [Y] because they have incompatible nonlinear base time units
Represents a duration, the difference between two dates or times. Timedelta is the pandas equivalent of python's datetime. timedelta and is interchangeable with it in most cases. Parameters: value : Timedelta, timedelta, np.
Timedeltas are differences in times, expressed in difference units, e.g. days, hours, minutes, seconds. They can be both positive and negative. Timedelta is a subclass of datetime.
Datetime UnitsThe Datetime and Timedelta data types support a large number of time units, as well as generic units which can be coerced into any of the other units based on input data. Datetimes are always stored with an epoch of 1970-01-01T00:00.
Adding np.timedelta64(1, 'Y')
to an array of dtype datetime64[ns]
does not work because a year does not correspond to a fixed number of nanoseconds. Sometimes a year is 365 days, sometimes 366 days, sometimes there is even an extra leap second. (Note extra leap seconds, such as the one that occurred on 2015-06-30 23:59:60, are not representable as NumPy datetime64s.)
The easiest way I know to add a year to a NumPy datetime64[ns]
array is to break it into constituent parts, such as years, months and days, do the computation on integer arrays, and then recompose the datetime64 array:
def year(dates):
"Return an array of the years given an array of datetime64s"
return dates.astype('M8[Y]').astype('i8') + 1970
def month(dates):
"Return an array of the months given an array of datetime64s"
return dates.astype('M8[M]').astype('i8') % 12 + 1
def day(dates):
"Return an array of the days of the month given an array of datetime64s"
return (dates - dates.astype('M8[M]')) / np.timedelta64(1, 'D') + 1
def combine64(years, months=1, days=1, weeks=None, hours=None, minutes=None,
seconds=None, milliseconds=None, microseconds=None, nanoseconds=None):
years = np.asarray(years) - 1970
months = np.asarray(months) - 1
days = np.asarray(days) - 1
types = ('<M8[Y]', '<m8[M]', '<m8[D]', '<m8[W]', '<m8[h]',
'<m8[m]', '<m8[s]', '<m8[ms]', '<m8[us]', '<m8[ns]')
vals = (years, months, days, weeks, hours, minutes, seconds,
milliseconds, microseconds, nanoseconds)
return sum(np.asarray(v, dtype=t) for t, v in zip(types, vals)
if v is not None)
# break the datetime64 array into constituent parts
years, months, days = [f(dates_np) for f in (year, month, day)]
# recompose the datetime64 array after adding 1 to the years
dates3 = combine64(years+1, months, days)
yields
In [185]: dates3
Out[185]:
array(['1981-01-01', '1981-01-02', '1981-01-03', ..., '2015-12-30',
'2015-12-31', '2016-01-01'], dtype='datetime64[D]')
Despite appearing to be so much code, it is actually quicker than adding a DateOffset of 1 year:
In [206]: %timeit dates + DateOffset(years=1)
1 loops, best of 3: 285 ms per loop
In [207]: %%timeit
.....: years, months, days = [f(dates_np) for f in (year, month, day)]
.....: combine64(years+1, months, days)
.....:
100 loops, best of 3: 2.65 ms per loop
Of course, pd.tseries.offsets offers a whole panoply of offsets that have no easy counterpart when working with NumPy datetime64s.
Here is what it says in the numpy documentation:
There are two Timedelta units (‘Y’, years and ‘M’, months) which are treated specially, because how much time they represent changes depending on when they are used. While a timedelta day unit is equivalent to 24 hours, there is no way to convert a month unit into days, because different months have different numbers of days.
Days and weeks seem to work though:
dates4 = dates_np + np.timedelta64(1, 'D')
dates5 = dates_np + np.timedelta64(1, 'W')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With