I'd like to filter out weekend data and only look at data for weekdays (mon(0)-fri(4)). I'm new to pandas, what's the best way to accomplish this in pandas?
import datetime
from pandas import *
data = read_csv("data.csv")
data.my_dt
Out[52]:
0 2012-10-01 02:00:39
1 2012-10-01 02:00:38
2 2012-10-01 02:01:05
3 2012-10-01 02:01:07
4 2012-10-01 02:02:03
5 2012-10-01 02:02:09
6 2012-10-01 02:02:03
7 2012-10-01 02:02:35
8 2012-10-01 02:02:33
9 2012-10-01 02:03:01
10 2012-10-01 02:08:53
11 2012-10-01 02:09:04
12 2012-10-01 02:09:09
13 2012-10-01 02:10:20
14 2012-10-01 02:10:45
...
I'd like to do something like:
weekdays_only = data[data.my_dt.weekday() < 5]
AttributeError: 'numpy.int64' object has no attribute 'weekday'
but this doesn't work, I haven't quite grasped how column datetime objects are accessed.
The eventual goal being to arrange hierarchically to weekday hour-range, something like:
monday, 0-6, 7-12, 13-18, 19-23
tuesday, 0-6, 7-12, 13-18, 19-23
sort_values(by=column_name) to sort pandas. DataFrame by the contents of a column named column_name . Before doing this, the data in the column must be converted to datetime if it is in another format using pandas. to_datetime(arg) with arg as the column of dates.
groupby() function is used to split the data into groups based on some criteria. pandas objects can be split on any of their axes. The abstract definition of grouping is to provide a mapping of labels to group names. sort : Sort group keys.
Pandas has a built-in function called to_datetime()that converts date and time in string format to a DateTime object. As you can see, the 'date' column in the DataFrame is currently of a string-type object. Thus, to_datetime() converts the column to a series of the appropriate datetime64 dtype.
your call to the function "weekday" does not work as it operates on the index of data.my_dt, which is an int64 array (this is where the error message comes from)
you could create a new column in data containing the weekdays using something like:
data['weekday'] = data['my_dt'].apply(lambda x: x.weekday())
then you can filter for weekdays with:
weekdays_only = data[data['weekday'] < 5 ]
I hope this helps
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With