I have a DateTimeIndex consisting of 15-minute intervals.
I also have the same function written in 2 ways that I want to apply across the whole Data Frame. The point of the function is to get if a particular day is a weekday or not.
Here they are:
def weekend(datum):
if (datum.weekday() == 5) or (datum.weekday() == 6):
return "Weekend"
else:
return "Working day"
# written with being fed the DateTimeIndex in mind
def weekendfromnumber(number):
if (number == 5) or (number == 6):
return "Weekend"
else:
return "Working day"
# written with being fed the integer of the intermediate columng weekday in mind
I wanted to apply the first function by feeding it with DateTimeIndex directly as in :
df15['Type of day'] = df15.index.apply(weekend)
but I get the error:
AttributeError: 'DatetimeIndex' object has no attribute 'apply'
If I use the second function as in:
df15['Type of day'] = df15.weekday.apply(weekendfromnumber)
I get the effect that I want but at the cost of needing to create an intermediate column named weekday with:
df15['weekday'] = df15.index.weekday
Since I do not want an intermediate column I thought that doing something like:
df15['Type of day'] = df15.index.weekday.apply(weekendfromnumber)
would work, but instead I get the error
AttributeError: 'numpy.ndarray' object has no attribute 'apply'
So, the overarching question is:
How do I use the data that is already in the DateTimeIndex and feed it to a custom function using apply()?
You could create a temporary pd.Series
for your datetime index, but why not just use np.where
as it is much faster here:
df15['Type of day'] = np.where(df15.index.weekday > 5, "Weekend", "Working Day")
If your function is complicated and you cannot use np.where, call to_series()
first:
df15['Type of day'] = df15.index.to_series().apply(weekend)
Timings:
Tested with a dummy dataframe with 100 rows and one column:
df = pd.DataFrame(np.random.rand(100,1),
index=pd.DatetimeIndex(freq='D',
start='2017-01-01',
periods=100))
In [1]: %timeit df.index.to_series().apply(weekend)
1.11 ms ± 127 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [2]: %timeit np.where(df.index.weekday > 5, "Weekend", "Weekday")
192 µs ± 45.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With