Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Querying timedelta column in pandas, and filtering rows

I have a column of timedelta in pandas. It is in the format x days 00:00:00. I want to filter out and flag the rows which have a value >=30 minutes. I have no clue how to do that using pandas. I tried booleans and if statements but it didn't work. Any help would be appreciated.

like image 586
user1541055 Avatar asked Jan 22 '18 07:01

user1541055


People also ask

How do I filter rows in pandas?

You can use df[df["Courses"] == 'Spark'] to filter rows by a condition in pandas DataFrame. Not that this expression returns a new DataFrame with selected rows. You can also write the above statement with a variable.

How do I filter a date column in a DataFrame?

To filter rows based on dates, first format the dates in the DataFrame to datetime64 type. Then use the DataFrame. loc[] and DataFrame. query[] function from the Pandas package to specify a filter condition.

How do you use pandas Timedelta?

You can access various components of the Timedelta or TimedeltaIndex directly using the attributes days,seconds,microseconds,nanoseconds . These are identical to the values returned by datetime. timedelta , in that, for example, the . seconds attribute represents the number of seconds >= 0 and < 1 day.


1 Answers

You can convert timedeltas to seconds by total_seconds and compare with scalar:

df = df[df['col'].dt.total_seconds() < 30]

Or compare with Timedelta:

df = df[df['col'] < pd.Timedelta(30, unit='s')]

Sample:

df = pd.DataFrame({'col':pd.to_timedelta(['25:10:01','00:01:20','00:00:20'])})
print (df)
              col
0 1 days 01:10:01
1 0 days 00:01:20
2 0 days 00:00:20

df = df[df['col'].dt.total_seconds() < 30]
print (df)
       col
2 00:00:20
like image 148
jezrael Avatar answered Oct 19 '22 03:10

jezrael