Really simple task in Pandas is throwing an error I don't understand. With a simple dataset like this:
test=pd.DataFrame([[1,3],[1,6],[2,4],[3,9],[3,2]],columns=['a','b'])
I can execute the following to count the number of times a value appears in the 'a' column of test.
test['count']=test.groupby('a').transform('count')
This yields:
>>> test
a b count
0 1 3 2
1 1 6 2
2 2 4 1
3 3 9 2
4 3 2 2
Perfect. But with my real data, this doesn't work. Here's a tiny snippet of my data that works for reproducing the problem:
newtest=pd.DataFrame([['010010201001000','001','0220','AL','0'],['010010201001001','001','0220','AL','0'],['010010201001002','001','0220','AL','0'],['010010201001003','001','0160','AL','0'],['010010201001004','001','0160','AL','0']],columns=['BlockID','CountyFP','District','state_x','HD'])
newtest['blocks']=newtest.groupby(['CountyFP','District','state_x']).transform('count')
Trying that gives me this error:
ValueError: Wrong number of items passed 2, placement implies 1
I don't really see what makes my 'real' example any different from the play set, and googling this error produces other examples of the error but it's still unclear to me exactly why it is being produced here.
More confusing still, if I just execute the righthand side of the above code, it works fine - produces newtest with counts in every column. So it's like the assignment is what is giving it problems.
You didn't select any columns to perform the aggregation on so it did it on the remaining columns which are 2, if you select one of the columns then you get the desired result:
In [6]:
newtest['blocks'] = newtest.groupby(['CountyFP','District','state_x'])['BlockID'].transform('count')
newtest
Out[6]:
BlockID CountyFP District state_x HD blocks
0 010010201001000 001 0220 AL 0 3
1 010010201001001 001 0220 AL 0 3
2 010010201001002 001 0220 AL 0 3
3 010010201001003 001 0160 AL 0 2
4 010010201001004 001 0160 AL 0 2
output of your attempt:
In [9]:
newtest.groupby(['CountyFP','District','state_x']).transform('count')
Out[9]:
BlockID HD
0 3 3
1 3 3
2 3 3
3 2 2
4 2 2
You can see that it generates 2 columns as these are the remaining columns hence the error message you observed.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With