I can return the frequency of all columns in a nice dataframe with a total column.
for column in df:
df.groupby(column).size().reset_index(name="total")
Count total
0 1 423
1 2 488
2 3 454
3 4 408
4 5 343
Precipitation total
0 Fine 7490
1 Fog 23
2 Other 51
3 Raining 808
Month total
0 1 717
1 2 648
2 3 710
3 4 701
I put the loop in a function, but this returns the first column "Count" only.
def count_all_columns_freq(dataframe_x):
for column in dataframe_x:
return dataframe_x.groupby(column).size().reset_index(name="total")
count_all_columns_freq(df)
Count total
0 1 423
1 2 488
2 3 454
3 4 408
4 5 343
Is there a way to do this using slicing or other method e.g. for column in dataframe_x[1:]:
In pandas you can get the count of the frequency of a value that occurs in a DataFrame column by using Series. value_counts() method, alternatively, If you have a SQL background you can also get using groupby() and count() method.
After grouping a DataFrame object on one column, we can apply count() method on the resulting groupby object to get a DataFrame object containing frequency count. This method can be used to count frequencies of objects over single or multiple columns.
Using the count(), size() method, Series. value_counts(), and pandas. Index. value_counts() method we can count the number of frequency of itemsets in the given DataFrame.
You can use pandas DataFrame. groupby(). count() to group columns and compute the count or size aggregate, this calculates a rows count for each group combination.
Based on your comment, you just want to return a list of dataframe:
def count_all_columns_freq(df):
return [df.groupby(column).size().reset_index(name="total")
for column in df]
You can select columns in many ways in pandas
, e.g. by slicing or by passing a list of columns like in df[['colA', 'colB']]
. You don't need to change the function for that.
Personally, I would return a dictionary instead:
def frequency_dict(df):
return {column: df.groupby(column).size()
for column in df}
# so that I could use it like this:
freq = frequency_dict(df)
freq['someColumn'].loc[value]
EDIT: "What if I want to count the number of NaN
?"
In that case, you can pass dropna=False
to groupby
(this works for pandas >= 1.1.0
):
def count_all_columns_freq(df):
return [df.groupby(column, dropna=False).size().reset_index(name="total")
for column in df]
You can create a dataframe from the grouped by sizes with concat
and a bit of renaming.
First get the columns you want, for example :
cols = df.columns
Then use concat
to patch them together, define the keys
as the columns (the new indices) and the names
as "group" and "sizes", that's their displayed names.
res = pd.concat((df.groupby(col, dropna=False).size() for col in cols, keys=cols, names=["indices", "groups"])
Now, we want this set in a dataframe, not a series.
res = pd.DataFrame(res)
Finally, we rename the totals,
res = res.rename(columns={0 : "totals"})
Example :
import pandas as pd
import numpy as np
rng = np.random.default_rng() # random number generation
A = rng.choice(["a", "b", "c"], 50)
B = rng.choice(["e", "f", "d"], 50)
C = rng.choice(['1', '2', '3', '5', '11'], 50)
df = pd.DataFrame({"A":A, "B":B, "C":C})
cols = df.columns
res = pd.DataFrame(pd.concat((df.groupby(c, dropna=False).size() for c in cols),
keys=cols, names=["indices", "groups"]))
res = res.rename(columns = {0 : "totals"})
Outputs :
totals
indices groups
A a 16
b 17
c 17
B d 9
e 22
f 19
C 1 10
11 16
2 8
3 10
5 6
Creating the relevant function can be done as such :
def concat_groups(df, cols=None):
if cols is None:
cols = df.columns
res = pd.DataFrame(pd.concat((df.groupby(c, dropna=False).size() for c in cols),
keys=cols, names=["indices","groups"]))
res = res.rename(columns = {0 : "totals"})
return res
So in this case you can either input a dataframe and a list of columns you selected or input a dataframe with only the relevant columns.
Cheers
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With