Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the difference between DataFrame.groupby(column).apply(len) and DataFrame[column].value_counts()?

The following python code is giving me an AssertionError:

p = DataFrame.groupby(column).apply(len).sort_values(ascending=False)
q = DataFrame[column].value_counts()
pd.testing.assert_series_equal(p, q)

I thought these functions do the same thing and in fact the resulting series are similar when looking at the first few rows but according to the assertion error they are only 59% similar.


1 Answers

Both are almost similar, only need same index names and same Series names - set all to default None:

DataFrame = pd.DataFrame({'a': [1,5,4,2,1,2,1,2,1,4,2,3,2,1]})
column = 'a'
p = DataFrame.groupby(column).apply(len).sort_values(ascending=False)
q = DataFrame[column].value_counts()

print (p.name)
None
print (q.name)
a

print (p.index.name)
a
print (q.index.name)
None

pd.testing.assert_series_equal(p.rename_axis(None), q.rename(None))
like image 149
jezrael Avatar answered May 15 '26 01:05

jezrael