Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

A function to return the frequency counts of all or specific columns

I can return the frequency of all columns in a nice dataframe with a total column.

for column in df:     
    df.groupby(column).size().reset_index(name="total")

Count   total
0   1   423
1   2   488
2   3   454
3   4   408
4   5   343

Precipitation   total
0   Fine        7490
1   Fog         23
2   Other       51
3   Raining     808

Month   total
0   1   717
1   2   648
2   3   710
3   4   701

I put the loop in a function, but this returns the first column "Count" only.

def count_all_columns_freq(dataframe_x):
    for column in dataframe_x:
        return dataframe_x.groupby(column).size().reset_index(name="total")

count_all_columns_freq(df)

Count   total
0   1   423
1   2   488
2   3   454
3   4   408
4   5   343

Is there a way to do this using slicing or other method e.g. for column in dataframe_x[1:]:

like image 908
Edison Avatar asked Dec 18 '20 13:12

Edison


People also ask

How do you get the frequency count in pandas?

In pandas you can get the count of the frequency of a value that occurs in a DataFrame column by using Series. value_counts() method, alternatively, If you have a SQL background you can also get using groupby() and count() method.

How can I get the frequency counts of each item in one or more columns in a DataFrame?

After grouping a DataFrame object on one column, we can apply count() method on the resulting groupby object to get a DataFrame object containing frequency count. This method can be used to count frequencies of objects over single or multiple columns.

How do you count the frequency of an element in a DataFrame in Python?

Using the count(), size() method, Series. value_counts(), and pandas. Index. value_counts() method we can count the number of frequency of itemsets in the given DataFrame.

How do I count two columns in pandas?

You can use pandas DataFrame. groupby(). count() to group columns and compute the count or size aggregate, this calculates a rows count for each group combination.


Video Answer


2 Answers

Based on your comment, you just want to return a list of dataframe:

def count_all_columns_freq(df):
    return [df.groupby(column).size().reset_index(name="total")
            for column in df]

You can select columns in many ways in pandas, e.g. by slicing or by passing a list of columns like in df[['colA', 'colB']]. You don't need to change the function for that.

Personally, I would return a dictionary instead:

def frequency_dict(df):
    return {column: df.groupby(column).size()
            for column in df}

# so that I could use it like this:
freq = frequency_dict(df)
freq['someColumn'].loc[value]

EDIT: "What if I want to count the number of NaN?"

In that case, you can pass dropna=False to groupby (this works for pandas >= 1.1.0):

def count_all_columns_freq(df):
    return [df.groupby(column, dropna=False).size().reset_index(name="total")
            for column in df]
like image 138
janluke Avatar answered Oct 27 '22 23:10

janluke


You can create a dataframe from the grouped by sizes with concat and a bit of renaming.

First get the columns you want, for example :

cols = df.columns 

Then use concat to patch them together, define the keys as the columns (the new indices) and the names as "group" and "sizes", that's their displayed names.

res = pd.concat((df.groupby(col, dropna=False).size() for col in cols, keys=cols, names=["indices", "groups"])

Now, we want this set in a dataframe, not a series.

res = pd.DataFrame(res)

Finally, we rename the totals,

res = res.rename(columns={0 : "totals"})

Example :

import pandas as pd
import numpy as np
rng = np.random.default_rng() # random number generation

A = rng.choice(["a", "b", "c"], 50)
B = rng.choice(["e", "f", "d"], 50)
C = rng.choice(['1', '2', '3', '5', '11'], 50)

df = pd.DataFrame({"A":A, "B":B, "C":C})

cols = df.columns
res = pd.DataFrame(pd.concat((df.groupby(c, dropna=False).size() for c in cols),  
                             keys=cols, names=["indices", "groups"]))

res = res.rename(columns = {0 : "totals"})

Outputs :

              totals
indices groups        
 A      a          16
        b          17
        c          17
 B      d           9
        e          22
        f          19
 C      1          10
        11         16
        2           8
        3          10
        5           6

Creating the relevant function can be done as such :

def concat_groups(df, cols=None):
    if cols is None:
        cols = df.columns

    res = pd.DataFrame(pd.concat((df.groupby(c, dropna=False).size() for c in cols),  
                                keys=cols, names=["indices","groups"]))
    
    res = res.rename(columns = {0 : "totals"})

    return res

So in this case you can either input a dataframe and a list of columns you selected or input a dataframe with only the relevant columns.

Cheers

like image 44
Nathan Furnal Avatar answered Oct 27 '22 23:10

Nathan Furnal