Using .iterrows() with series.nlargest() to get the highest number in a row in a Dataframe

Tags:

I am trying to create a function that uses df.iterrows() and Series.nlargest. I want to iterate over each row and find the largest number and then mark it as a 1. This is the data frame:

A   B    C
9   6    5
3   7    2

Here is the output I wish to have:

A    B   C
1    0   0
0    1   0

This is the function I wish to use here:

def get_top_n(df, top_n):
    """


    Parameters
    ----------
    df : DataFrame

    top_n : int
        The top number to get
    Returns
    -------
    top_numbers : DataFrame
    Returns the top number marked with a 1

    """
    # Implement Function
    for row in df.iterrows():
        top_numbers = row.nlargest(top_n).sum()

    return top_numbers

I get the following error: AttributeError: 'tuple' object has no attribute 'nlargest'

Help would be appreciated on how to re-write my function in a neater way and to actually work! Thanks in advance

616

asked Aug 02 '18 05:08

Deepak M

2 Answers

Add i variable, because iterrows return indices with Series for each row:

for i, row in df.iterrows():
    top_numbers = row.nlargest(top_n).sum()

General solution with numpy.argsort for positions in descending order, then compare and convert boolean array to integers:

def get_top_n(df, top_n):
    if top_n > len(df.columns):
        raise ValueError("Value is higher as number of columns")
    elif not isinstance(top_n, int):
        raise ValueError("Value is not integer")

    else:
        arr = ((-df.values).argsort(axis=1) < top_n).astype(int)
        df1 = pd.DataFrame(arr, index=df.index, columns=df.columns)
        return (df1)

df1 = get_top_n(df, 2)
print (df1)
   A  B  C
0  1  1  0
1  1  1  0

df1 = get_top_n(df, 1)
print (df1)
   A  B  C
0  1  0  0
1  0  1  0

EDIT:

Solution with iterrows is possible, but not recommended, because slow:

top_n = 2
for i, row in df.iterrows():
    top = row.nlargest(top_n).index
    df.loc[i] = 0
    df.loc[i, top] = 1

print (df)
   A  B  C
0  1  1  0
1  1  1  0

108

answered Oct 07 '22 01:10

jezrael

For context, the dataframe consists of stock return data for the S&P500 over approximately 4 years

def get_top_n(prev_returns, top_n):

    # generate dataframe populated with zeros for merging
    top_stocks = pd.DataFrame(0, columns = prev_returns.columns, index = prev_returns.index)

    # find top_n largest entries by row
    df = prev_returns.apply(lambda x: x.nlargest(top_n), axis=1)

    # merge dataframes
    top_stocks = top_stocks.merge(df, how = 'right').set_index(df.index)

    # return dataframe replacing non_zero answers with a 1
    return (top_stocks.notnull()) * 1

answered Oct 07 '22 00:10

Josmoor98

Related questions
                            
                                psycopg2.extras.DictCursor not returning dict in postgres
                            
                                Why does the result of scipy.sparse.csc_matrix.sum() change its type to numpy matrix?
                            
                                Simple way to print binary numbers in groups of nibbles
                            
                                PySpark Boolean Pivot
                            
                                plot two seaborn heatmap graphs side by side
                            
                                Cachetools for subsequent runs in python
                            
                                Could not convert string to float error from the Titanic competition
                            
                                String Operation on captured group in re Python
                            
                                What is the most efficient way of doing square root of sum of square of two numbers?
                            
                                Move a worksheet in a workbook using openpyxl or xl* or xlsxwriter?
                            
                                Check if string can be splitted into sentence using words in provided list
                            
                                Keras seems to hang after call to fit_generator
                            
                                Aggregating Rows Pandas
                            
                                How to get today - “6 months” date in PySpark(SQL) [duplicate]
                            
                                pipenv only installs .venv in home directory
                            
                                No 'print' output when using yield?
                            
                                Kivy Multiple Column RecyclerView
                            
                                Spacy - nlp.pipe() returns generator
                            
                                Drawing on top of image in PyQt5 tracing the mouse
                            
                                Failed to upload file with the TypeError : expected str, bytes or os.PathLike object, not list

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Using .iterrows() with series.nlargest() to get the highest number in a row in a Dataframe

Tags:

python

iterator

pandas

dataframe

Deepak M

People also ask

2 Answers

jezrael

Josmoor98

Recent Activity

Donate For Us