Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Pandas Select Index where index is larger than x

Tags:

Say I have a DataFrame df with date as index and some values. How can I select the rows where the date is larger than some value x?

I know I can convert the index to a column and then do the select df[df['date']>x], but is that slower than doing the operation on the index?

like image 315
user3092887 Avatar asked Jun 06 '14 18:06

user3092887


People also ask

How do I select a specific index in pandas?

If you'd like to select rows based on integer indexing, you can use the . iloc function. If you'd like to select rows based on label indexing, you can use the . loc function.

How do you find values greater than in pandas?

By using the pandas series.gt() method we can check if the elements of a series object are Greater Than a scalar value or not. The gt() comparison operation is exactly equivalent to series > Other.

Is ILOC slower than LOC?

iloc[[ id ]] (with a single-element list) takes 489 ms, almost half a second, 1,800x times slower than the identical .

How do you use greater than or equal to in pandas?

Pandas DataFrame: ge() functionThe ge() function returns greater than or equal to of dataframe and other, element-wise. Equivalent to ==, =!, <=, <, >=, > with support to choose axis (rows or columns) and level for comparison. Any single or multiple element data structure, or list-like object.


2 Answers

Example of selecting from a DataFrame with the use of index:

from numpy.random import randn from pandas import DataFrame from datetime import timedelta as td import dateutil.parser  d = dateutil.parser.parse("2014-01-01") df = DataFrame(randn(6,2), columns=list('AB'), index=[d + td(days=x) for x in range(1,7)])  In [1]: df Out[1]:                    A         B 2014-01-02 -1.172285  1.706200 2014-01-03  0.039511 -0.320798 2014-01-04 -0.192179 -0.539397 2014-01-05 -0.475917 -0.280055 2014-01-06  0.163376  1.124602 2014-01-07 -2.477812  0.656750  In [2]: df[df.index > dateutil.parser.parse("2014-01-04")] Out[2]:                    A         B 2014-01-05 -0.475917 -0.280055 2014-01-06  0.163376  1.124602 2014-01-07 -2.477812  0.656750 
like image 177
Datageek Avatar answered Sep 22 '22 03:09

Datageek


The existing answer is correct, however if we are selecting based on the index, the second method from here would be faster:

# Set index df = df.set_index(df['date'])  # Select observations between two datetimes df.loc[pd.TimeStamp('2002-1-1 01:00:00'):pd.TimeStamp('2002-1-1 04:00:00')] 
like image 34
ntg Avatar answered Sep 18 '22 03:09

ntg