Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Selecting Data between Specific hours in a pandas dataframe

My Pandas Dataframe frame looks something like this

 1. 2013-10-09 09:00:05
 2. 2013-10-09 09:05:00
 3. 2013-10-09 10:00:00
 4.  ............
 5.   ............
 6.   ............
 7. 2013-10-10 09:00:05
 8. 2013-10-10 09:05:00 
 9. 2013-10-10 10:00:00

I want the data lying in between hours 9 and 10 ...if anyone has worked on something like this ,it would be really helpful.

like image 409
itsaruns Avatar asked Oct 04 '13 10:10

itsaruns


People also ask

How do I select a specific date in pandas?

In order to select rows between two dates in pandas DataFrame, first, create a boolean mask using mask = (df['InsertedDates'] > start_date) & (df['InsertedDates'] <= end_date) to represent the start and end of the date range. Then you select the DataFrame that lies within the range using the DataFrame.

How do you find the difference between two panda timestamps?

We create a Panda DataFrame with 3 columns. Then we set the values of the to and fr columns to Pandas timestamps. Next, we subtract the values from df.fr by df.to and convert the type to timedelta64 with astype and assign that to df.

How do you select a specific value in a DataFrame?

Select Data Using Location Index (. This means that you can use dataframe. iloc[0:1, 0:1] to select the cell value at the intersection of the first row and first column of the dataframe. You can expand the range for either the row index or column index to select more data.

How do I compare time in pandas?

Comparison between pandas timestamp objects is carried out using simple comparison operators: >, <,==,< = , >=. The difference can be calculated using a simple '–' operator. Given time can be converted to pandas timestamp using pandas. Timestamp() method.


3 Answers

 In [7]: index = date_range('20131009 08:30','20131010 10:05',freq='5T')

In [8]: df = DataFrame(randn(len(index),2),columns=list('AB'),index=index)

In [9]: df
Out[9]: 
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 308 entries, 2013-10-09 08:30:00 to 2013-10-10 10:05:00
Freq: 5T
Data columns (total 2 columns):
A    308  non-null values
B    308  non-null values
dtypes: float64(2)

In [10]: df.between_time('9:00','10:00')
Out[10]: 
                            A         B
2013-10-09 09:00:00 -0.664639  1.597453
2013-10-09 09:05:00  1.197290 -0.500621
2013-10-09 09:10:00  1.470186 -0.963553
2013-10-09 09:15:00  0.181314 -0.242415
2013-10-09 09:20:00  0.969427 -1.156609
2013-10-09 09:25:00  0.261473  0.413926
2013-10-09 09:30:00 -0.003698  0.054953
2013-10-09 09:35:00  0.418147 -0.417291
2013-10-09 09:40:00  0.413565 -1.096234
2013-10-09 09:45:00  0.460293  1.200277
2013-10-09 09:50:00 -0.702444 -0.041597
2013-10-09 09:55:00  0.548385 -0.832382
2013-10-09 10:00:00 -0.526582  0.758378
2013-10-10 09:00:00  0.926738  0.178204
2013-10-10 09:05:00 -1.178534  0.184205
2013-10-10 09:10:00  1.408258  0.948526
2013-10-10 09:15:00  0.523318  0.327390
2013-10-10 09:20:00 -0.193174  0.863294
2013-10-10 09:25:00  1.355610 -2.160864
2013-10-10 09:30:00  1.930622  0.174683
2013-10-10 09:35:00  0.273551  0.870682
2013-10-10 09:40:00  0.974756 -0.327763
2013-10-10 09:45:00  1.808285  0.080267
2013-10-10 09:50:00  0.842119  0.368689
2013-10-10 09:55:00  1.065585  0.802003
2013-10-10 10:00:00 -0.324894  0.781885
like image 141
Jeff Avatar answered Oct 07 '22 06:10

Jeff


Make a new column for the time after splitting your original column . Use the below code to split your time for hours, minutes, and seconds:-

df[['h','m','s']] = df['Time'].astype(str).str.split(':', expand=True).astype(int)

Once you are done with that, you have to select the data by filtering it out:-

df9to10 =df[df['h'].between(9, 10, inclusive=True)]

And, it's dynamic, if you want to take another period between apart from 9 and 10.

like image 28
ak3191 Avatar answered Oct 07 '22 05:10

ak3191


Another method that uses query. Tested with Python 3.9.

from Pandas import Timestamp
from datetime import time
df = pd.DataFrame({"timestamp": 
[Timestamp("2017-01-03 09:30:00.049"), Timestamp("2017-01-03 09:30:00.049"),
 Timestamp("2017-12-29 16:12:34.214"), Timestamp("2017-12-29 16:17:19.006")]})
df["time"] = df.timestamp.dt.time
start_time = time(9,20,0)
end_time = time(10,0,0)
df_times = df.query("time >= @start_time and time <= @end_time")

In:

              timestamp
2017-01-03 09:30:00.049
2017-01-03 09:30:00.049
2017-12-29 16:12:34.214
2017-12-29 16:17:19.006

Out:

              timestamp             time
2017-01-03 09:30:00.049  09:30:00.049000
2017-01-03 09:30:00.049  09:30:00.049000

As a bonus, arbitrarily complex expressions can be used within a query, e.g. selecting everything within two separate time ranges (this is impossible with between_time).

like image 1
Contango Avatar answered Oct 07 '22 07:10

Contango