Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to do data analysis (like counts, ucounts, frequency) with pandas?

I have DataFrame like below:

df = pd.DataFrame([
    ("i", 1, 'GlIrbixGsmCL'),
    ("i", 1, 'GlIrbixGsmCL'),
    ("i", 1, '3IMR1UteQA'),
    ("c", 1, 'GlIrbixGsmCL'),
    ("i", 2, 'GlIrbixGsmCL'),
], columns=['type', 'cid', 'userid'])

Expected output like: expect output

For more details:

i_counts, c_counts      => df.groupby(["cid","type"]).size()
i_ucounts, c_ucounts    => df.groupby(["cid","type"])["userid"].nunique()
i_frequency,u_frequency => df.groupby(["cid","type"])["userid"].value_counts()

Looks it's a little complex for me, how to do with pandas to get the expected result?

The related screenshots: screenshots

like image 330
Silence He Avatar asked May 30 '21 04:05

Silence He


People also ask

How do you get the frequency count in pandas?

In pandas you can get the count of the frequency of a value that occurs in a DataFrame column by using Series. value_counts() method, alternatively, If you have a SQL background you can also get using groupby() and count() method.

Can pandas be used for data analysis?

Pandas is an open-source Python library designed to deal with data analysis and data manipulation. Citing the official website, “pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language.”

Which pandas method can be used to display frequency count of each data type?

After grouping a DataFrame object on one column, we can apply count() method on the resulting groupby object to get a DataFrame object containing frequency count. This method can be used to count frequencies of objects over single or multiple columns.

How do I count data points in pandas?

Pandas DataFrame count() Method The count() method counts the number of not empty values for each row, or column if you specify the axis parameter as axis='columns' , and returns a Series object with the result for each row (or column).


Video Answer


1 Answers

This is how I would approach this:

aggfuncs= {
    'counts': ('userid', 'count'), 
    'ucounts': ('userid', 'nunique'),
    'frequency': ('userid', lambda S: S.value_counts().to_dict()),
}

output = df.groupby(['cid', 'type']).agg(**aggfuncs).unstack()
output.columns = output.columns.map(lambda tup: '_'.join(tup[::-1]))

output:

     c_counts  i_counts  c_ucounts  i_ucounts          c_frequency                           i_frequency
cid
1         1.0       3.0        1.0        2.0  {'GlIrbixGsmCL': 1}  {'GlIrbixGsmCL': 2, '3IMR1UteQA': 1}
2         NaN       1.0        NaN        1.0                  NaN                   {'GlIrbixGsmCL': 1}

I think that is the core of what you want. You will need some cosmetic amendments to get the output exactly as in your example (e.g. fillna etc.).

like image 66
jabellcu Avatar answered Oct 12 '22 22:10

jabellcu