I've filtered my data as suggested here: With Pandas in Python, select the highest value row for each group
author cat val
0 author1 category2 15
1 author2 category4 9
2 author3 category1 7
3 author3 category3 7
Now, I want to only get the authors present in this data frame once. I wrote this, but it doesn't work:
def where_just_one_exists(group):
return group.loc[group.count() == 1]
most_expensive_single_category = most_expensive_for_each_model.groupby('author', as_index = False).apply(where_just_one_exists).reset_index(drop = True)
print most_expensive_single_category
Error:
File "/home/mike/anaconda/lib/python2.7/site-packages/pandas/core/indexing.py", line 1659, in check_bool_indexer
raise IndexingError('Unalignable boolean Series key provided')
pandas.core.indexing.IndexingError: Unalignable boolean Series key provided
My desired output is:
author cat val
0 author1 category2 15
1 author2 category4 9
2 author3 category1 7
3 author3 category3 7
Easier
df.groupby('author').filter(lambda x: len(x)==1)
author cat val
id
0 author1 category2 15
1 author2 category4 9
my solution is a bit more complex but still working
def groupbyOneOccurrence(df):
grouped = df.groupby("author")
retDf = pd.DataFrame()
for group in grouped:
if len(group[1]._get_values) == 1:
retDf = pd.concat([retDf, group[1]])
return retDf
author cat val
0 author1 category2 15
1 author2 category4 9
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With