Suppose I have a dataframe that looks like this:
group level
0 1 10
1 1 10
2 1 11
3 2 5
4 2 5
5 3 9
6 3 9
7 3 9
8 3 8
The desired output is this:
group level
0 1 10
5 3 9
Namely, this is the logic: look inside each group, if there is more than 1 distinct value present in the level column, return the first row in that group. For example, no row from group 2 is selected, because the only value present in the level column is 5.
In addition, how does the situation change if I want the last, instead of the first row of such groups?
What I have attempted was combining group_by statements, with creating sets from entries in the level column, but failed to produce anything even nearly sensible.
This can be done with groupby and using apply to run a simple function on each group:
def get_first_val(group):
has_multiple_vals = len(group['level'].unique()) >= 2
if has_multiple_vals:
return group['level'].loc[group['level'].first_valid_index()]
else:
return None
df.groupby('group').apply(get_first_val).dropna()
Out[8]:
group
1 10
3 9
dtype: float64
There's also a last_valid_index() method, so you wouldn't have to
make any huge changes to get the last row instead.
If you have other columns that you want to keep, you just need a slight tweak:
import numpy as np
df['col1'] = np.random.randint(10, 20, 9)
df['col2'] = np.random.randint(20, 30, 9)
df
Out[17]:
group level col1 col2
0 1 10 19 21
1 1 10 18 24
2 1 11 14 23
3 2 5 14 26
4 2 5 10 22
5 3 9 13 27
6 3 9 16 20
7 3 9 18 26
8 3 8 11 2
def get_first_val_keep_cols(group):
has_multiple_vals = len(group['level'].unique()) >= 2
if has_multiple_vals:
return group.loc[group['level'].first_valid_index(), :]
else:
return None
df.groupby('group').apply(get_first_val_keep_cols).dropna()
Out[20]:
group level col1 col2
group
1 1 10 19 21
3 3 9 13 27
This would be simpler:
In [121]:
print df.groupby('group').\
agg(lambda x: x.values[0] if (x.values!=x.values[0]).any() else np.nan).\
dropna()
level
group
1 10
3 9
For each group, if any of the values are not the same as the first value, aggregate that group into its first value; otherwise, aggregate it to nan.
Finally, dropna().
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With