Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas value_counts(normalize=True) gives 'IntegerArray' object has no attribute 'sum'

Pandas value_counts(normalize=True) fails when an extension datatype is used. For example, when creating an int8 Series containing pd.NA would typically use Int8 extension datatype but an error occurs: AttributeError: 'IntegerArray' object has no attribute 'sum'. What's the workaround?

pd.Series([1,pd.NA],dtype='Int8').value_counts(normalize=True)
like image 414
BSalita Avatar asked Apr 09 '26 21:04

BSalita


2 Answers

This is believed to be a regression bug, see GH33317. Good news is that this is fixed on pandas 1.1.

pd.__version__  
# '1.1.0.dev0+2004.g8d10bfb6f'

pd.Series([1, pd.NA], dtype='Int8').value_counts(normalize=True) 

1    1.0
dtype: float64

More Examples

s = pd.Series([1, 1, 1, 2, 2, 3, pd.NA], dtype='Int8') 
s.value_counts()
 
1    3
2    2
3    1
dtype: Int64

s.value_counts(normalize=True)

1    0.500000
2    0.333333
3    0.166667
dtype: float64

s.value_counts(normalize=True, dropna=False)

1      0.428571
2      0.285714
NaN    0.142857
3      0.142857
dtype: float64
like image 96
cs95 Avatar answered Apr 11 '26 11:04

cs95


Each of the following can be used to workaround the issue:

# 1) works if you're ok with dropping NA
pd.Series([1,pd.NA],dtype='Int8').dropna().astype(int).value_counts(normalize=True)

# 2) works if you're ok with switching to a non-extension datatype such as float
pd.Series([1,pd.NA],dtype='Int8').astype(float).value_counts(normalize=True)

# 3) The issue may be fixed in a future versions of pandas. Try using a pandas version >= 1.1
like image 37
BSalita Avatar answered Apr 11 '26 11:04

BSalita



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!