Keeping rows in Pandas where the same ID appears more than n times and convert to list per ID

Tags:

pandas

I have a DataFrame like below:

     VID  value
1     1    xx
2     2    xx1
3     2    xx2
4     2    xx3
5     2    xx4
6     3    xx
7     3    xx
8     3    xx
9     4    zz1
10    4    zz2
11    4    zz3
12    4    zz4
13    4    zz5

I want to keep only the rows whose VID exists in more than n=3 cases. In the above example, I'd keep only rows 2-5 and 9-13 (since only the VIDs 2 and 4 appear more than 3 times). The 'value' field is irrelevant.

After keeping the needed rows, I'd like to convert my data as a list of lists while appending an 'end' value at the end of each one:

Click to copy

[[xx1, xx2, xx3, xx4, 'end'], [zz1, zz2, zz3, zz4, zz5, 'end']]

Is it possible to get the above without a for loop?

667

asked Oct 20 '15 15:10

Stergios

1 Answers

You can group by VID column and then take count of each group. Then after that use it for indexing your original df, so as to only take rows that have greater than 3 count. Example -

Click to copy

countdf = df.groupby('VID').count()
result = df.loc[df['VID'].isin(countdf[countdf['value'] > 3].index)]

Demo -

Click to copy

In [49]: df
Out[49]:
    VID value
1     1    xx
2     2   xx1
3     2   xx2
4     2   xx3
5     2   xx4
6     3    xx
7     3    xx
8     3    xx
9     4   zz1
10    4   zz2
11    4   zz3
12    4   zz4
13    4   zz5

In [51]: df.groupby('VID').count()
Out[51]:
     value
VID
1        1
2        4
3        3
4        5

In [52]: countdf = df.groupby('VID').count()

In [53]: df.loc[df['VID'].isin(countdf[countdf['value'] > 3].index)]
Out[53]:
    VID value
2     2   xx1
3     2   xx2
4     2   xx3
5     2   xx4
9     4   zz1
10    4   zz2
11    4   zz3
12    4   zz4
13    4   zz5

Then after this, you can again group based on VID and then convert the groups into list and then again to list , Example -

Click to copy

resultlist = result.groupby('VID')['value'].apply(list).tolist()

Demo -

Click to copy

In [54]: result = df.loc[df['VID'].isin(countdf[countdf['value'] > 3].index)]

In [55]: result.groupby('VID')['value'].apply(list).tolist()
Out[55]: [['xx1', 'xx2', 'xx3', 'xx4'], ['zz1', 'zz2', 'zz3', 'zz4', 'zz5']]

Please note, above would not get you the 'end' value in the list, I am guessing that is not necessary, but if you really want that, you can manually add it after getting the list. Example -

Click to copy

resultlist = [elem + ['end'] for elem in resultlist]

174

answered Oct 10 '22 21:10

Anand S Kumar

Related questions
                            
                                How to set x tick labels against the actual values of the series
                            
                                How to execute query in Odoo-8 from Python?
                            
                                Django is it better to check user.is_authenticated in views.py or in template?
                            
                                Using Google patent api
                            
                                csv writer adds double quotes twice in python
                            
                                Python partition string with regular expressions
                            
                                Permission Denied when executing python file in linux
                            
                                Wrapping np.arrays __pow__ method
                            
                                How to properly utilize the multiprocessing module in Python?
                            
                                OSError 24 (Too many open files) when reading bunch of FITS with astropy.io
                            
                                How to create symbolic matrix and do substitute in batch with sympy?
                            
                                Simple sum-function in Python with numba doesn't compute
                            
                                How can I send text when the window is minimized?
                            
                                How to plot grad(f(x,y))?
                            
                                Limiting number of input values in an array/list in Python
                            
                                Run Celery Worker from FLASK app
                            
                                understanding '*' "keyword only" argument notation in python3 functions [duplicate]
                            
                                How to delete a locked (flock) file without race condition: before or after releasing the lock?
                            
                                SymPy -- define domain of variable
                            
                                Python/tkinter - How do I get the window size including borders on Windows?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Keeping rows in Pandas where the same ID appears more than n times and convert to list per ID

Tags:

python

pandas

Stergios

People also ask

1 Answers

Anand S Kumar

Recent Activity

Donate For Us