I've recently stumbled upon a new awesome pendulum
library for easier work with datetimes.
In pandas
, there is this handy to_datetime()
method allowing to convert series and other objects to datetimes:
raw_data['Mycol'] = pd.to_datetime(raw_data['Mycol'], format='%d%b%Y:%H:%M:%S.%f')
What would be the canonical way to create a custom to_<something>
method -
in this case to_pendulum()
method which would be able to convert Series of date strings directly to Pendulum
objects?
This may lead to Series
having various interesting capabilities like, for instance, converting a series of date strings to a series of "offsets from now" - human datetime diffs.
What would be the canonical way to create a custom
to_<something>
method - in this caseto_pendulum()
method which would be able to convert Series of date strings directly toPendulum
objects?
After looking through the API a bit, I must say I'm impressed with what they've done. Unfortunately, I don't think Pendulum
and pandas
can work together (at least, with the current latest version - v0.21
).
The most important reason is that pandas
does not natively support Pendulum
as a datatype. All the natively supported datatypes (np.int
, np.float
and np.datetime64
) all support vectorisation in some form. You are not going to get a shred of performance improvement using a dataframe over, say, a vanilla loop and list. If anything, calling apply
on a Series
with Pendulum
objects is going to be slower (because of all the API overheads).
Another reason is that Pendulum
is a subclass of datetime
-
from datetime import datetime
isinstance(pendulum.now(), datetime)
True
This is important, because, as mentioned above, datetime
is a supported datatype, so pandas will attempt to coerce datetime
to pandas' native datetime format - Timestamp
. Here's an example.
print(s)
0 2017-11-09 18:43:45
1 2017-11-09 20:15:27
2 2017-11-09 22:29:00
3 2017-11-09 23:42:34
4 2017-11-10 00:09:40
5 2017-11-10 00:23:14
6 2017-11-10 03:32:17
7 2017-11-10 10:59:24
8 2017-11-10 11:12:59
9 2017-11-10 13:49:09
s = s.apply(pendulum.parse)
s
0 2017-11-09 18:43:45+00:00
1 2017-11-09 20:15:27+00:00
2 2017-11-09 22:29:00+00:00
3 2017-11-09 23:42:34+00:00
4 2017-11-10 00:09:40+00:00
5 2017-11-10 00:23:14+00:00
6 2017-11-10 03:32:17+00:00
7 2017-11-10 10:59:24+00:00
8 2017-11-10 11:12:59+00:00
9 2017-11-10 13:49:09+00:00
Name: timestamp, dtype: datetime64[ns, <TimezoneInfo [UTC, GMT, +00:00:00, STD]>]
s[0]
Timestamp('2017-11-09 18:43:45+0000', tz='<TimezoneInfo [UTC, GMT, +00:00:00, STD]>')
type(s[0])
pandas._libs.tslib.Timestamp
So, with some difficulty (involving dtype=object
), you could load Pendulum
objects into dataframes. Here's how you'd do that -
v = np.vectorize(pendulum.parse)
s = pd.Series(v(s), dtype=object)
s
0 2017-11-09T18:43:45+00:00
1 2017-11-09T20:15:27+00:00
2 2017-11-09T22:29:00+00:00
3 2017-11-09T23:42:34+00:00
4 2017-11-10T00:09:40+00:00
5 2017-11-10T00:23:14+00:00
6 2017-11-10T03:32:17+00:00
7 2017-11-10T10:59:24+00:00
8 2017-11-10T11:12:59+00:00
9 2017-11-10T13:49:09+00:00
s[0]
<Pendulum [2017-11-09T18:43:45+00:00]>
However, this is essentially useless, because calling any pendulum
method (via apply
) will now not only be super slow, but will also end up in the result being coerced to Timestamp
again, an exercise in futility.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With