My dataframe has a string column that can contain long strings. I want to get a list of unique strings, and also a list for each unique string containing row indices where it appears. I can think of two ways of doing this. <ol> <li>First get the unique list using <code>.unique()</code> and then iterate over the dataframe to build up lists of indices where each unique value shows up</li> <li>Use <code>.groupBy()</code> to create groups and get the lists of row indices in each group</li> </ol> But I am not quite sure which one is more efficient (or if there are other ways to do this more efficiently). The reason I am thinking about efficiency is that the field I want to uniquify and groupBy is a string field possibly having long strings! Thanks!

Demo: <pre class="prettyprint"><code>In [16]: df Out[16]: col 0 aaaa 1 bbbb 2 aaaa 3 aaaa 4 bbbb 5 cccc In [17]: df.groupby('col').groups Out[17]: {'aaaa': Int64Index([0, 2, 3], dtype='int64'), 'bbbb': Int64Index([1, 4], dtype='int64'), 'cccc': Int64Index([5], dtype='int64')} </code></pre> or as a DataFrame: <pre class="prettyprint"><code>In [31]: pd.DataFrame([[k,v.values] for k,v in df.groupby('col').groups.items()], columns=['col','indices']) Out[31]: col indices 0 aaaa [0, 2, 3] 1 bbbb [1, 4] 2 cccc [5] </code></pre>

Pandas - Get unique values from column along with lists of row indices where they appear

Tags:

python

pandas

My dataframe has a string column that can contain long strings. I want to get a list of unique strings, and also a list for each unique string containing row indices where it appears.

I can think of two ways of doing this.

First get the unique list using .unique() and then iterate over the dataframe to build up lists of indices where each unique value shows up
Use .groupBy() to create groups and get the lists of row indices in each group

But I am not quite sure which one is more efficient (or if there are other ways to do this more efficiently). The reason I am thinking about efficiency is that the field I want to uniquify and groupBy is a string field possibly having long strings!

Thanks!

481

asked Sep 13 '17 18:09

shikhanshu

1 Answers

Demo:

In [16]: df
Out[16]:
    col
0  aaaa
1  bbbb
2  aaaa
3  aaaa
4  bbbb
5  cccc

In [17]: df.groupby('col').groups
Out[17]:
{'aaaa': Int64Index([0, 2, 3], dtype='int64'),
 'bbbb': Int64Index([1, 4], dtype='int64'),
 'cccc': Int64Index([5], dtype='int64')}

or as a DataFrame:

In [31]: pd.DataFrame([[k,v.values]
                        for k,v in df.groupby('col').groups.items()], 
                      columns=['col','indices'])
Out[31]:
    col    indices
0  aaaa  [0, 2, 3]
1  bbbb     [1, 4]
2  cccc        [5]

137

answered Oct 05 '22 16:10

MaxU - stop WAR against UA

Related questions
                            
                                python logger level inherited from root set to warning by default
                            
                                Extracting multiple xml trees from a mixed content document
                            
                                How to generate several functions using a for loop in Python? [duplicate]
                            
                                PyQt QTableView After click, how to know row and col
                            
                                Model class doesn't declare an explicit app_label and isn't in an application in INSTALLED_APPS
                            
                                Check what thread is currently doing in python
                            
                                How to create a pandas DataFrame with several numpy 1d arrays?
                            
                                Tensor type mismatch when moving to GPU
                            
                                What happens when you initialize instance variables outside of __init__
                            
                                Default python /usr/bin/python instead of /usr/local/bin/python
                            
                                How to merge overlapping columns
                            
                                The table-striped class is not giving me alternate color
                            
                                How to install miniconda on Ubuntu automatically
                            
                                Rolling Window In Pandas - Explanation
                            
                                Why return type is not checked in python3? [duplicate]
                            
                                Keras model.predict always 0
                            
                                Django Admin, sort with custom function
                            
                                Increasing bar width in bar chart using Altair
                            
                                How do you properly integrate unit tests for file parsing with pytest?
                            
                                Merge the first row with the column headers in a dataframe

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With