Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

numpy and pandas timedelta error

In Python I have an array of dates generated (or read from a CSV-file) using pandas, and I want to add one year to each date. I can get it working using pandas but not using numpy. What am I doing wrong? Or is it a bug in either pandas or numpy?

Thanks!

import numpy as np
import pandas as pd
from pandas.tseries.offsets import DateOffset

# Generate range of dates using pandas.
dates = pd.date_range('1980-01-01', '2015-01-01')

# Add one year using pandas.
dates2 = dates + DateOffset(years=1)

# Convert result to numpy. THIS WORKS!
dates2_np = dates2.values

# Convert original dates to numpy array.
dates_np = dates.values

# Add one year using numpy. THIS FAILS!
dates3 = dates_np + np.timedelta64(1, 'Y')

# TypeError: Cannot get a common metadata divisor for NumPy datetime metadata [ns] and [Y] because they have incompatible nonlinear base time units
like image 605
questiondude Avatar asked Jul 14 '15 13:07

questiondude


People also ask

What is Numpy Timedelta?

Represents a duration, the difference between two dates or times. Timedelta is the pandas equivalent of python's datetime. timedelta and is interchangeable with it in most cases. Parameters: value : Timedelta, timedelta, np.

What is Timedelta in pandas?

Timedeltas are differences in times, expressed in difference units, e.g. days, hours, minutes, seconds. They can be both positive and negative. Timedelta is a subclass of datetime.

Is Timedelta a data type?

Datetime UnitsThe Datetime and Timedelta data types support a large number of time units, as well as generic units which can be coerced into any of the other units based on input data. Datetimes are always stored with an epoch of 1970-01-01T00:00.


2 Answers

Adding np.timedelta64(1, 'Y') to an array of dtype datetime64[ns] does not work because a year does not correspond to a fixed number of nanoseconds. Sometimes a year is 365 days, sometimes 366 days, sometimes there is even an extra leap second. (Note extra leap seconds, such as the one that occurred on 2015-06-30 23:59:60, are not representable as NumPy datetime64s.)

The easiest way I know to add a year to a NumPy datetime64[ns] array is to break it into constituent parts, such as years, months and days, do the computation on integer arrays, and then recompose the datetime64 array:

def year(dates):
    "Return an array of the years given an array of datetime64s"
    return dates.astype('M8[Y]').astype('i8') + 1970

def month(dates):
    "Return an array of the months given an array of datetime64s"
    return dates.astype('M8[M]').astype('i8') % 12 + 1

def day(dates):
    "Return an array of the days of the month given an array of datetime64s"
    return (dates - dates.astype('M8[M]')) / np.timedelta64(1, 'D') + 1

def combine64(years, months=1, days=1, weeks=None, hours=None, minutes=None,
              seconds=None, milliseconds=None, microseconds=None, nanoseconds=None):
    years = np.asarray(years) - 1970
    months = np.asarray(months) - 1
    days = np.asarray(days) - 1
    types = ('<M8[Y]', '<m8[M]', '<m8[D]', '<m8[W]', '<m8[h]',
             '<m8[m]', '<m8[s]', '<m8[ms]', '<m8[us]', '<m8[ns]')
    vals = (years, months, days, weeks, hours, minutes, seconds,
            milliseconds, microseconds, nanoseconds)
    return sum(np.asarray(v, dtype=t) for t, v in zip(types, vals)
               if v is not None)

# break the datetime64 array into constituent parts
years, months, days = [f(dates_np) for f in (year, month, day)]
# recompose the datetime64 array after adding 1 to the years
dates3 = combine64(years+1, months, days)

yields

In [185]: dates3
Out[185]: 
array(['1981-01-01', '1981-01-02', '1981-01-03', ..., '2015-12-30',
       '2015-12-31', '2016-01-01'], dtype='datetime64[D]')

Despite appearing to be so much code, it is actually quicker than adding a DateOffset of 1 year:

In [206]: %timeit dates + DateOffset(years=1)
1 loops, best of 3: 285 ms per loop

In [207]: %%timeit
   .....: years, months, days = [f(dates_np) for f in (year, month, day)]
   .....: combine64(years+1, months, days)
   .....: 
100 loops, best of 3: 2.65 ms per loop

Of course, pd.tseries.offsets offers a whole panoply of offsets that have no easy counterpart when working with NumPy datetime64s.

like image 54
unutbu Avatar answered Nov 10 '22 01:11

unutbu


Here is what it says in the numpy documentation:

There are two Timedelta units (‘Y’, years and ‘M’, months) which are treated specially, because how much time they represent changes depending on when they are used. While a timedelta day unit is equivalent to 24 hours, there is no way to convert a month unit into days, because different months have different numbers of days.

Days and weeks seem to work though:

dates4 = dates_np + np.timedelta64(1, 'D')
dates5 = dates_np + np.timedelta64(1, 'W')
like image 38
jjinking Avatar answered Nov 10 '22 01:11

jjinking