convert Int64Index to Int

Tags:

pandas

I'm iterating through a dataframe (called hdf) and applying changes on a row by row basis. hdf is sorted by group_id and assigned a 1 through n rank on some criteria.

# Groupby function creates subset dataframes (a dataframe per distinct group_id).
grouped = hdf.groupby('group_id')

# Iterate through each subdataframe. 
for name, group in grouped:

    # This grabs the top index for each subdataframe
    index1 = group[group['group_rank']==1].index

    # If criteria1 == 0, flag all rows for removal
    if(max(group['criteria1']) == 0):    
        for x in range(rank1, rank1 + max(group['group_rank'])):
            hdf.loc[x,'remove_row'] = 1

I'm getting the following error:

TypeError: int() argument must be a string or a number, not 'Int64Index'

I get the same error when I try to cast rank1 explicitly I get the same error:

rank1 = int(group[group['auction_rank']==1].index)

Can someone explain what is happening and provide an alternative?

803

asked Oct 13 '15 19:10

Christopher Jenkins

1 Answers

The answer to your specific question is that index1 is an Int64Index (basically a list), even if it has one element. To get that one element, you can use index1[0].

But there are better ways of accomplishing your goal. If you want to remove all of the rows in the "bad" groups, you can use filter:

hdf = hdf.groupby('group_id').filter(lambda group: group['criteria1'].max() != 0)

If you only want to remove certain rows within matching groups, you can write a function and then use apply:

def filter_group(group):
    if group['criteria1'].max() != 0:
        return group
    else:
        return group.loc[other criteria here]

hdf = hdf.groupby('group_id').apply(filter_group)

(If you really like your current way of doing things, you should know that loc will accept an index, not just an integer, so you could also do hdf.loc[group.index, 'remove_row'] = 1).

200

answered Sep 22 '22 09:09

Evan Wright

Related questions
                            
                                pandas read_csv column dtype is set to decimal but converts to string
                            
                                Split nested array values from Pandas Dataframe cell over multiple rows
                            
                                Pandas: get multiindex level as series
                            
                                python pandas sum by hour of day
                            
                                Copying MultiIndex dataframes with pd.read_clipboard?
                            
                                How to merge/combine columns in pandas?
                            
                                Create a pivot table that lists out values
                            
                                Iterate over pandas series
                            
                                Pandas merge dataframes with shared column, fillna in left with right
                            
                                SettingWithCopyWarning, even when using loc (?) [duplicate]
                            
                                Pandas to_csv with quoting=3 (QUOTE_NONNUMERIC) doesn't work
                            
                                replacing empty strings with NaN in Pandas
                            
                                Prevent pandas from reading "NA" as NaN
                            
                                How to re-order the columns based on another dataframe with the same columns but different order
                            
                                Setting axis labels for histogram pandas
                            
                                How to drop unique rows in a pandas dataframe?
                            
                                Pandas groupby apply vs transform with specific functions
                            
                                Converting tuples to multiple indices in a Pandas Dataframe
                            
                                How do I convert a numpy array into a pandas dataframe?
                            
                                Easiest way to create a color gradient on excel using python/pandas?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With