Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I use DateTimeIndex as input for named function in apply() across Pandas DataFrame

I have a DateTimeIndex consisting of 15-minute intervals.

I also have the same function written in 2 ways that I want to apply across the whole Data Frame. The point of the function is to get if a particular day is a weekday or not.

Here they are:

def weekend(datum):
    if (datum.weekday() == 5) or (datum.weekday() == 6):
        return "Weekend"
    else:
        return "Working day"
 # written with being fed the DateTimeIndex in mind


def weekendfromnumber(number):
    if (number == 5) or (number == 6):
        return "Weekend"
    else:
        return "Working day"
# written with being fed the integer of the intermediate columng weekday in mind

I wanted to apply the first function by feeding it with DateTimeIndex directly as in :

df15['Type of day'] = df15.index.apply(weekend)

but I get the error:

AttributeError: 'DatetimeIndex' object has no attribute 'apply'

If I use the second function as in:

df15['Type of day'] = df15.weekday.apply(weekendfromnumber)

I get the effect that I want but at the cost of needing to create an intermediate column named weekday with:

df15['weekday'] = df15.index.weekday

Since I do not want an intermediate column I thought that doing something like:

df15['Type of day'] = df15.index.weekday.apply(weekendfromnumber) 

would work, but instead I get the error

AttributeError: 'numpy.ndarray' object has no attribute 'apply'

So, the overarching question is:

How do I use the data that is already in the DateTimeIndex and feed it to a custom function using apply()?

like image 254
rioZg Avatar asked May 14 '18 09:05

rioZg


1 Answers

You could create a temporary pd.Series for your datetime index, but why not just use np.where as it is much faster here:

df15['Type of day'] = np.where(df15.index.weekday > 5, "Weekend", "Working Day")

If your function is complicated and you cannot use np.where, call to_series() first:

df15['Type of day'] = df15.index.to_series().apply(weekend)

Timings:

Tested with a dummy dataframe with 100 rows and one column:

df = pd.DataFrame(np.random.rand(100,1), 
                  index=pd.DatetimeIndex(freq='D', 
                                         start='2017-01-01',
                                         periods=100))

In [1]: %timeit df.index.to_series().apply(weekend)
1.11 ms ± 127 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [2]: %timeit np.where(df.index.weekday > 5, "Weekend", "Weekday")
192 µs ± 45.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
like image 58
Julien Marrec Avatar answered Oct 19 '22 23:10

Julien Marrec