Get first row of dataframe in Python Pandas based on criteria

People also ask

How do I get the first row of Panda DataFrame?

pandas. Series is easier to get the value. You can get the first row with iloc[0] and the last row with iloc[-1] . If you want to get the value of the element, you can do with iloc[0]['column_name'] , iloc[-1]['column_name'] .

How do you get the first row of a value in Python?

To get the value of the first row of a given column use pandas. DataFrame. iloc[] property .

This tutorial is a very good one for pandas slicing. Make sure you check it out. Onto some snippets... To slice a dataframe with a condition, you use this format:

>>> df[condition]

This will return a slice of your dataframe which you can index using iloc. Here are your examples:

Get first row where A > 3 (returns row 2)

>>> df[df.A > 3].iloc[0]
A    4
B    6
C    3
Name: 2, dtype: int64

If what you actually want is the row number, rather than using iloc, it would be df[df.A > 3].index[0].

Get first row where A > 4 AND B > 3:

>>> df[(df.A > 4) & (df.B > 3)].iloc[0]
A    5
B    4
C    5
Name: 4, dtype: int64

Get first row where A > 3 AND (B > 3 OR C > 2) (returns row 2)

>>> df[(df.A > 3) & ((df.B > 3) | (df.C > 2))].iloc[0]
A    4
B    6
C    3
Name: 2, dtype: int64

Now, with your last case we can write a function that handles the default case of returning the descending-sorted frame:

>>> def series_or_default(X, condition, default_col, ascending=False):
...     sliced = X[condition]
...     if sliced.shape[0] == 0:
...         return X.sort_values(default_col, ascending=ascending).iloc[0]
...     return sliced.iloc[0]
>>> 
>>> series_or_default(df, df.A > 6, 'A')
A    5
B    4
C    5
Name: 4, dtype: int64

As expected, it returns row 4.

For existing matches, use query:

df.query(' A > 3' ).head(1)
Out[33]: 
   A  B  C
2  4  6  3

df.query(' A > 4 and B > 3' ).head(1)
Out[34]: 
   A  B  C
4  5  4  5

df.query(' A > 3 and (B > 3 or C > 2)' ).head(1)
Out[35]: 
   A  B  C
2  4  6  3

you can take care of the first 3 items with slicing and head:

df[df.A>=4].head(1)
df[(df.A>=4)&(df.B>=3)].head(1)
df[(df.A>=4)&((df.B>=3) * (df.C>=2))].head(1)

The condition in case nothing comes back you can handle with a try or an if...

try:
    output = df[df.A>=6].head(1)
    assert len(output) == 1
except: 
    output = df.sort_values('A',ascending=False).head(1)

For the point that 'returns the value as soon as you find the first row/record that meets the requirements and NOT iterating other rows', the following code would work:

def pd_iter_func(df):
    for row in df.itertuples():
        # Define your criteria here
        if row.A > 4 and row.B > 3:
            return row

It is more efficient than Boolean Indexing when it comes to a large dataframe.

To make the function above more applicable, one can implements lambda functions:

def pd_iter_func(df: DataFrame, criteria: Callable[[NamedTuple], bool]) -> Optional[NamedTuple]:
    for row in df.itertuples():
        if criteria(row):
            return row

pd_iter_func(df, lambda row: row.A > 4 and row.B > 3)

As mentioned in the answer to the 'mirror' question, pandas.Series.idxmax would also be a nice choice.

def pd_idxmax_func(df, mask):
    return df.loc[mask.idxmax()]

pd_idxmax_func(df, (df.A > 4) & (df.B > 3))

Related questions
                            
                                Python elegant inverse function of int(string,base)
                            
                                using Flask and Tornado together?
                            
                                How does the order of mixins affect the derived class?
                            
                                Fill cells with colors using openpyxl?
                            
                                Pandas DataFrame Add column to index without resetting
                            
                                How to I display why some tests where skipped while using py.test?
                            
                                Running an Excel macro via Python?
                            
                                Why isn't .ico file defined when setting window's icon?
                            
                                How to update the image of a Tkinter Label widget?
                            
                                How do I add a title and axis labels to Seaborn Heatmap?
                            
                                how to add a coroutine to a running asyncio loop?
                            
                                How can I check for unused import in many Python files?
                            
                                Suppressing scientific notation in pandas?
                            
                                How to make a custom activation function with only Python in Tensorflow?
                            
                                summing two columns in a pandas dataframe
                            
                                Select multiple columns by labels in pandas
                            
                                Vim autocomplete for Python
                            
                                Python calling method in class
                            
                                How to call an external program in python and retrieve the output and return code?
                            
                                How to find newest file with .MP3 extension in directory?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Get first row of dataframe in Python Pandas based on criteria

Tags:

python

pandas

People also ask

Recent Activity

Donate For Us