I have a dataframe:
       A         C         D
0    one  0.410599 -0.205158
1    one  0.144044  0.313068
2    one  0.333674 -0.742165
3  three  0.761038 -2.552990
4  three  1.494079  2.269755
5    two  1.454274 -0.854096
6    two  0.121675  0.653619
7    two  0.443863  0.864436
Let's assume that A is the anchor column. I now want to display each group value only once, at the top:
        A         C         D
0    one  0.410599 -0.205158
1         0.144044  0.313068
2         0.333674 -0.742165
3  three  0.761038 -2.552990
4         1.494079  2.269755
5    two  1.454274 -0.854096
6         0.121675  0.653619
7         0.443863  0.864436
This is what I've come up with:
df['A'] = df.groupby('A', as_index=False)['A']\
        .apply(lambda x: x.str.replace('.*', '').set_value(0, x.values[0])).values
My strategy was to do a groupby and then set all values to an empty string other than the first. This doesn't seem to work, because I get:
ValueError: Length of values does not match length of index
Which means that the output I get is incorrect. Any ideas/suggestions/improvements welcome.
I should add that I am trying to generalise a solution that can single out values at the top OR bottom OR middle of each group, so I'd give more preference to a solution that helps me do that (to understand, the example above shows how to single out values only at the top of each group, however, I want to generalise a solution that allows me to single them out at the bottom or in the middle).
Your method didn't work because of the index error. When you groupby 'A', the index is represented the same way in the grouped data too. Since set_value(0) could not find the correct index, it creates a new object with that index. That's the reason why there was a length mismatch. 
Fix 1reset_index(drop=True) 
df['A'] = df.groupby('A')['A'].apply(lambda x: x.str.replace('.*', '')\
                      .reset_index(drop=True).set_value(0, x.values[0])).values
df
      A         C         D
0    one  0.410599 -0.205158
1         0.144044  0.313068
2         0.333674 -0.742165
3  three  0.761038 -2.552990
4         1.494079  2.269755
5    two  1.454274 -0.854096
6         0.121675  0.653619
7         0.443863  0.864436
Fix 2set_value 
set_value has a 3rd parameter called takeable which determines how the index is treated. It is False by default, but setting it to True worked for my case.
In addition to Zero's solutions, the solution for isolating values at the centre of their groups is as follows:
df.A = df.groupby('A'['A'].apply(lambda x: x.str.replace('.*', '')\
                           .set_value(len(x) // 2, x.values[0], True)).values 
df
       A         C         D
0         0.410599 -0.205158
1    one  0.144044  0.313068
2         0.333674 -0.742165
3         0.761038 -2.552990
4  three  1.494079  2.269755
5         1.454274 -0.854096
6    two  0.121675  0.653619
7         0.443863  0.864436
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With