Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas grouby and transform('count') gives placement error - works fine on smaller dataset

Tags:

python

pandas

Really simple task in Pandas is throwing an error I don't understand. With a simple dataset like this:

test=pd.DataFrame([[1,3],[1,6],[2,4],[3,9],[3,2]],columns=['a','b'])

I can execute the following to count the number of times a value appears in the 'a' column of test.

test['count']=test.groupby('a').transform('count')

This yields:

>>> test
       a  b  count
    0  1  3      2
    1  1  6      2
    2  2  4      1
    3  3  9      2
    4  3  2      2

Perfect. But with my real data, this doesn't work. Here's a tiny snippet of my data that works for reproducing the problem:

newtest=pd.DataFrame([['010010201001000','001','0220','AL','0'],['010010201001001','001','0220','AL','0'],['010010201001002','001','0220','AL','0'],['010010201001003','001','0160','AL','0'],['010010201001004','001','0160','AL','0']],columns=['BlockID','CountyFP','District','state_x','HD'])
newtest['blocks']=newtest.groupby(['CountyFP','District','state_x']).transform('count')

Trying that gives me this error:

ValueError: Wrong number of items passed 2, placement implies 1

I don't really see what makes my 'real' example any different from the play set, and googling this error produces other examples of the error but it's still unclear to me exactly why it is being produced here.

More confusing still, if I just execute the righthand side of the above code, it works fine - produces newtest with counts in every column. So it's like the assignment is what is giving it problems.

like image 543
AustinC Avatar asked Oct 19 '25 08:10

AustinC


1 Answers

You didn't select any columns to perform the aggregation on so it did it on the remaining columns which are 2, if you select one of the columns then you get the desired result:

In [6]:
newtest['blocks'] = newtest.groupby(['CountyFP','District','state_x'])['BlockID'].transform('count')
newtest

Out[6]:
           BlockID CountyFP District state_x HD  blocks
0  010010201001000      001     0220      AL  0       3
1  010010201001001      001     0220      AL  0       3
2  010010201001002      001     0220      AL  0       3
3  010010201001003      001     0160      AL  0       2
4  010010201001004      001     0160      AL  0       2

output of your attempt:

In [9]:
newtest.groupby(['CountyFP','District','state_x']).transform('count')

Out[9]:
   BlockID  HD
0        3   3
1        3   3
2        3   3
3        2   2
4        2   2

You can see that it generates 2 columns as these are the remaining columns hence the error message you observed.

like image 169
EdChum Avatar answered Oct 21 '25 21:10

EdChum



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!