I want to count distinct values per column (with pd.value_counts
I guess) grouping data by some level in MultiIndex. The multiindex is taken care of with groupby(level=
parameter, but apply
raises a ValueError
Original dataframe:
>>> df = pd.DataFrame(np.random.choice(list('ABC'), size=(10,5)),
columns=['c1','c2','c3','c4','c5'],
index=pd.MultiIndex.from_product([['foo', 'bar'],
['w','y','x','y','z']]))
c1 c2 c3 c4 c5
foo w C C B A A
y A A C B A
x A B C C C
y A B C C C
z A C B C B
bar w B C C A C
y A A C A A
x A B B B A
y A A C A B
z A B B C B
What I want:
c1 c2 c3 c4 c5
foo A 4 2 0 3 2
B 1 2 2 1 2
C 0 1 3 1 1
bar A 4 1 0 1 2
B 0 2 2 1 1
C 1 2 3 3 2
I try to do:
>>> df.groupby(level=0).apply(pd.value_counts)
ValueError: could not broadcast input array from shape (5,5) into shape (5)
I can do it myself manually, but I think it must be a more obvious way.
groups = [g.apply(pd.value_counts).fillna(0) for n, g in df.groupby(level=0)]
index = df.index.get_level_values(0).unique()
correct_result = pd.concat(groups, keys=index) # THIS WORKS AS EXPECTED
I mean, this isn't that long to write, but I feel like I'm reinventing the wheel. Aren't this kind of operations done by groupby function?
Is there a more straightforward way of doing this, other than doing the split-apply-combine myself?
Return a Series containing counts of unique values. The resulting object will be in descending order so that the first element is the most frequently-occurring element.
Group by and value_counts Groupby is a very powerful pandas method. You can group by one column and count the values of another column per this column value using value_counts. Using groupby and value_counts we can count the number of activities each person did.
Use count() by Column NameUse pandas DataFrame. groupby() to group the rows by column and use count() method to get the count for each group by ignoring None and Nan values. It works with non-floating type data as well.
count() should be used when you want to find the frequency of valid values present in columns with respect to specified col . . value_counts() should be used to find the frequencies of a series.
Let’s see how to Groupby values count on the pandas dataframe. To count Groupby values in the pandas dataframe we are going to use groupby () size () and unstack () method.
Both counts () and value_counts () are great utilities for quickly understanding the shape of your data. In this post, we learned about groupby, count, and value_counts – three of the main methods in Pandas. Pandas is a powerful tool for manipulating data once you know the core operations and how to use it.
For value_counts use parameter dropna=True to count with NaN values. To start, here is the syntax that you may apply in order groupby and count in Pandas DataFrame: The DataFrame used in this article is available from Kaggle.
Similar to the SQL GROUP BY statement, the Pandas method works by splitting our data, aggregating it in a given way (or ways), and re-combining the data in a meaningful way. Because the .groupby () method works by first splitting the data, we can actually work with the groups directly.
Use stack
for MultiIndex Series
, then SeriesGroupBy.value_counts
and last unstack
for DataFrame
:
np.random.seed(123)
df = pd.DataFrame(np.random.choice(list('ABC'), size=(10,5)),
columns=['c1','c2','c3','c4','c5'],
index=pd.MultiIndex.from_product([['foo', 'bar'],
['w','y','x','y','z']]))
print (df)
c1 c2 c3 c4 c5
foo w C B C C A
y C C B C B
x C B A B C
y B A C A B
z C B A A A
bar w A B C A C
y A A B A B
x A A A C B
y B C C C B
z A A C B A
df1 = df.stack().groupby(level=[0,2]).value_counts().unstack(1, fill_value=0)
print (df1)
c1 c2 c3 c4 c5
bar A 4 3 1 2 1
B 1 1 1 1 3
C 0 1 3 2 1
foo A 0 1 2 2 2
B 1 3 1 1 2
C 4 1 2 2 1
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With