I'm iterating through a dataframe (called hdf) and applying changes on a row by row basis. hdf is sorted by group_id and assigned a 1 through n rank on some criteria.
# Groupby function creates subset dataframes (a dataframe per distinct group_id).
grouped = hdf.groupby('group_id')
# Iterate through each subdataframe.
for name, group in grouped:
# This grabs the top index for each subdataframe
index1 = group[group['group_rank']==1].index
# If criteria1 == 0, flag all rows for removal
if(max(group['criteria1']) == 0):
for x in range(rank1, rank1 + max(group['group_rank'])):
hdf.loc[x,'remove_row'] = 1
I'm getting the following error:
TypeError: int() argument must be a string or a number, not 'Int64Index'
I get the same error when I try to cast rank1 explicitly I get the same error:
rank1 = int(group[group['auction_rank']==1].index)
Can someone explain what is happening and provide an alternative?
Immutable sequence used for indexing and alignment. The basic object storing axis labels for all pandas objects. Int64Index is a special case of Index with purely integer labels. .
// Converted the Int32 value 340 to the Int64 value 340. // Converted the Int32 value 2147483647 to the Int64 value 2147483647. Converts the value of the specified 32-bit signed integer to an equivalent 64-bit signed integer. The 32-bit signed integer to convert. A 64-bit signed integer that is equivalent to value.
value represents a number that is less than Int64.MinValue or greater than Int64.MaxValue. The following example attempts to convert each element in an array of numeric strings to a long integer.
If provider is null, the NumberFormatInfo for the current culture is used. If you prefer not to handle an exception if the conversion fails, you can call the Int64.TryParse method instead. It returns a Boolean value that indicates whether the conversion succeeded or failed.
The answer to your specific question is that index1
is an Int64Index (basically a list), even if it has one element. To get that one element, you can use index1[0]
.
But there are better ways of accomplishing your goal. If you want to remove all of the rows in the "bad" groups, you can use filter
:
hdf = hdf.groupby('group_id').filter(lambda group: group['criteria1'].max() != 0)
If you only want to remove certain rows within matching groups, you can write a function and then use apply
:
def filter_group(group):
if group['criteria1'].max() != 0:
return group
else:
return group.loc[other criteria here]
hdf = hdf.groupby('group_id').apply(filter_group)
(If you really like your current way of doing things, you should know that loc
will accept an index, not just an integer, so you could also do hdf.loc[group.index, 'remove_row'] = 1
).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With