Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

"ValueError: Length of values does not match length of index" when trying to modify column values a pandas groupby

I have a dataframe:

       A         C         D
0    one  0.410599 -0.205158
1    one  0.144044  0.313068
2    one  0.333674 -0.742165
3  three  0.761038 -2.552990
4  three  1.494079  2.269755
5    two  1.454274 -0.854096
6    two  0.121675  0.653619
7    two  0.443863  0.864436

Let's assume that A is the anchor column. I now want to display each group value only once, at the top:

        A         C         D
0    one  0.410599 -0.205158
1         0.144044  0.313068
2         0.333674 -0.742165
3  three  0.761038 -2.552990
4         1.494079  2.269755
5    two  1.454274 -0.854096
6         0.121675  0.653619
7         0.443863  0.864436

This is what I've come up with:

df['A'] = df.groupby('A', as_index=False)['A']\
        .apply(lambda x: x.str.replace('.*', '').set_value(0, x.values[0])).values

My strategy was to do a groupby and then set all values to an empty string other than the first. This doesn't seem to work, because I get:

ValueError: Length of values does not match length of index

Which means that the output I get is incorrect. Any ideas/suggestions/improvements welcome.

I should add that I am trying to generalise a solution that can single out values at the top OR bottom OR middle of each group, so I'd give more preference to a solution that helps me do that (to understand, the example above shows how to single out values only at the top of each group, however, I want to generalise a solution that allows me to single them out at the bottom or in the middle).

like image 210
cs95 Avatar asked Jan 03 '23 09:01

cs95


1 Answers

Your method didn't work because of the index error. When you groupby 'A', the index is represented the same way in the grouped data too. Since set_value(0) could not find the correct index, it creates a new object with that index. That's the reason why there was a length mismatch.

Fix 1
reset_index(drop=True)

df['A'] = df.groupby('A')['A'].apply(lambda x: x.str.replace('.*', '')\
                      .reset_index(drop=True).set_value(0, x.values[0])).values
df

      A         C         D
0    one  0.410599 -0.205158
1         0.144044  0.313068
2         0.333674 -0.742165
3  three  0.761038 -2.552990
4         1.494079  2.269755
5    two  1.454274 -0.854096
6         0.121675  0.653619
7         0.443863  0.864436

Fix 2
set_value

set_value has a 3rd parameter called takeable which determines how the index is treated. It is False by default, but setting it to True worked for my case.

In addition to Zero's solutions, the solution for isolating values at the centre of their groups is as follows:

df.A = df.groupby('A'['A'].apply(lambda x: x.str.replace('.*', '')\
                           .set_value(len(x) // 2, x.values[0], True)).values 

df

       A         C         D
0         0.410599 -0.205158
1    one  0.144044  0.313068
2         0.333674 -0.742165
3         0.761038 -2.552990
4  three  1.494079  2.269755
5         1.454274 -0.854096
6    two  0.121675  0.653619
7         0.443863  0.864436
like image 183
Bharath Avatar answered Jan 06 '23 00:01

Bharath