I've got the following data:
import schedule, datetime
import pandas as pd
from datetime import datetime, timedelta
dfori = [
{'id': '1','time': '2023-05-06T23:00:02Z'},
{'id': '2','time': '2023-05-05T00:00:23Z'},
{'id': '3','time': '2023-05-04T00:00:30Z'}]
df = pd.DataFrame(dfori, columns = [
'id','result_video_upload_time'])
I want a code that will cycle through the entire dataframe and filter out values unless they're between X and Y.
Here are the steps I have taken:
start_date = datetime.utcnow().replace(microsecond = 0)
end_date = (datetime.utcnow() - timedelta(hours = 1))
dfpre = df.loc[((df['date'] > end_date) & (df['date'] < start_date))]
dfpost = df.loc[((df['date'] > end_date) & (df['date'] < start_date))]
This code works if I turn my datetime.utcnow() into an object and break it apart. But breaking it apart only works if the date doesn't need to change, with UTC as the time zone, that code breaks after 7 PM.
The error this produces:
TypeError: Invalid comparison between dtype=datetime64[ns, UTC] and datetime
I attempted to follow other posts and tried to slap .tx_convert(None) on my dataframe like so:
df['date'] =df['date'].dt.tz_convert(None)
I didn't think it would work as we're already in UTC time. Anywho, I'm unsure of how to proceed from here.
Use timezone aware Timestamps if your dataframe's datetime is time zone aware (Z means UTC). Here's a MRE:
import pandas as pd
# --> dummy data
df = pd.DataFrame(
[
{"id": "1", "time": "2023-05-06T23:00:02Z"},
{"id": "2", "time": "2023-05-05T00:00:23Z"},
{"id": "3", "time": "2023-05-04T00:00:30Z"},
],
columns=["id", "time"],
)
df["date"] = pd.to_datetime(df["time"])
# <-- dummy data
# you could also use pd.Timestamp("now", tz="UTC") here:
start_date = pd.Timestamp("2023-05-07", tz="UTC").replace(microsecond=0)
end_date = start_date - pd.Timedelta(days=2)
dfpre = df.loc[((df["date"] > end_date) & (df["date"] < start_date))]
dfpost = df.loc[((df["date"] > end_date) & (df["date"] < start_date))]
print(dfpre)
id time date
0 1 2023-05-06T23:00:02Z 2023-05-06 23:00:02+00:00
1 2 2023-05-05T00:00:23Z 2023-05-05 00:00:23+00:00
print(dfpost)
id time date
0 1 2023-05-06T23:00:02Z 2023-05-06 23:00:02+00:00
1 2 2023-05-05T00:00:23Z 2023-05-05 00:00:23+00:00
In general, avoid datetime.utcnow() - some background info. pandas has all you need.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With