I have DataFrame like below: <pre class="prettyprint"><code>df = pd.DataFrame([ ("i", 1, 'GlIrbixGsmCL'), ("i", 1, 'GlIrbixGsmCL'), ("i", 1, '3IMR1UteQA'), ("c", 1, 'GlIrbixGsmCL'), ("i", 2, 'GlIrbixGsmCL'), ], columns=['type', 'cid', 'userid']) </code></pre> Expected output like: <img src="https://i.stack.imgur.com/tRSF4.png" alt="expect output"> For more details: <pre class="prettyprint"><code>i_counts, c_counts => df.groupby(["cid","type"]).size() i_ucounts, c_ucounts => df.groupby(["cid","type"])["userid"].nunique() i_frequency,u_frequency => df.groupby(["cid","type"])["userid"].value_counts() </code></pre> Looks it's a little complex for me, how to do with pandas to get the expected result? The related screenshots: <img src="https://i.stack.imgur.com/jjxJa.png" alt="screenshots">

This is how I would approach this: <pre class="prettyprint"><code>aggfuncs= { 'counts': ('userid', 'count'), 'ucounts': ('userid', 'nunique'), 'frequency': ('userid', lambda S: S.value_counts().to_dict()), } output = df.groupby(['cid', 'type']).agg(**aggfuncs).unstack() output.columns = output.columns.map(lambda tup: '_'.join(tup[::-1])) </code></pre> output: <pre class="prettyprint"><code> c_counts i_counts c_ucounts i_ucounts c_frequency i_frequency cid 1 1.0 3.0 1.0 2.0 {'GlIrbixGsmCL': 1} {'GlIrbixGsmCL': 2, '3IMR1UteQA': 1} 2 NaN 1.0 NaN 1.0 NaN {'GlIrbixGsmCL': 1} </code></pre> I think that is the core of what you want. You will need some cosmetic amendments to get the output exactly as in your example (e.g. fillna etc.).

How to do data analysis (like counts, ucounts, frequency) with pandas?

Tags:

python

pandas

pandas-groupby

data-analysis

I have DataFrame like below:

df = pd.DataFrame([
    ("i", 1, 'GlIrbixGsmCL'),
    ("i", 1, 'GlIrbixGsmCL'),
    ("i", 1, '3IMR1UteQA'),
    ("c", 1, 'GlIrbixGsmCL'),
    ("i", 2, 'GlIrbixGsmCL'),
], columns=['type', 'cid', 'userid'])

Expected output like: expect output

For more details:

i_counts, c_counts      => df.groupby(["cid","type"]).size()
i_ucounts, c_ucounts    => df.groupby(["cid","type"])["userid"].nunique()
i_frequency,u_frequency => df.groupby(["cid","type"])["userid"].value_counts()

Looks it's a little complex for me, how to do with pandas to get the expected result?

The related screenshots:

330

asked May 30 '21 04:05

Silence He

Video Answer

1 Answers

This is how I would approach this:

aggfuncs= {
    'counts': ('userid', 'count'), 
    'ucounts': ('userid', 'nunique'),
    'frequency': ('userid', lambda S: S.value_counts().to_dict()),
}

output = df.groupby(['cid', 'type']).agg(**aggfuncs).unstack()
output.columns = output.columns.map(lambda tup: '_'.join(tup[::-1]))

output:

     c_counts  i_counts  c_ucounts  i_ucounts          c_frequency                           i_frequency
cid
1         1.0       3.0        1.0        2.0  {'GlIrbixGsmCL': 1}  {'GlIrbixGsmCL': 2, '3IMR1UteQA': 1}
2         NaN       1.0        NaN        1.0                  NaN                   {'GlIrbixGsmCL': 1}

I think that is the core of what you want. You will need some cosmetic amendments to get the output exactly as in your example (e.g. fillna etc.).

answered Oct 12 '22 22:10

jabellcu

Related questions
                            
                                How to convert a function in a third party library to be async?
                            
                                Python macOS builds run from Terminal but crash on Finder launch
                            
                                Training MSE loss larger than theoretical maximum?
                            
                                Why does client.recv(1024) return an empty byte literal in this bare-bones WebSocket Server implementation?
                            
                                How to generate swagger documentation for aws-lambda python API?
                            
                                Python: mean() doesn't work when groupby aggregates dataframe to one line
                            
                                Mutlithreading with raw PyMySQL for celery
                            
                                Using a Data Converter to Display 3D Volume as Images
                            
                                Can't run Jupyter Notebook with python 3.9 env
                            
                                Pause a FFmpeg encoding in a Python Popen subprocess on Windows
                            
                                Slow Socket IO response when using Docker
                            
                                Should the #X format specifier really make the "0x" prefix upper-case?
                            
                                How to improve performance on a lambda function on a massive dataframe
                            
                                _C.cpython-38-x86_64-linux-gnu.so: undefined symbol: _ZN6caffe28TypeMeta21_typeMetaDataInstanceIdEEPKNS_6detail12TypeMetaDataEv
                            
                                Create and solve different combinations of linear equation systems using data stored in several matrices in python
                            
                                How to build a Tensorflow model with more than one input?
                            
                                Dynamically getting a list of standard library python packages names
                            
                                SIGPIPE writing to a closed pipe error in an EBS Docker app
                            
                                How do I find the maximum sum of subarray if i have to delete the largest element in the subarray
                            
                                Converting linreg function from pinescript to Python?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With