The question was originally asked here as a comment but could not get a proper answer as the question was marked as a duplicate. For a given <code>pandas.DataFrame</code>, let us say <pre class="prettyprint"><code>df = DataFrame({'A' : [5,6,3,4], 'B' : [1,2,3, 5]}) df A B 0 5 1 1 6 2 2 3 3 3 4 5 </code></pre> How can we select rows from a list, based on values in a column (<code>'A'</code> for instance) For instance <pre class="prettyprint"><code># from list_of_values = [3,4,6] # we would like, as a result # A B # 2 3 3 # 3 4 5 # 1 6 2 </code></pre> Using <code>isin</code> as mentioned here is not satisfactory as it does not keep order from the input list of <code>'A'</code> values. How can the abovementioned goal be achieved?

One way to overcome this is to make the <code>'A'</code> column an <code>index</code> and use <code>loc</code> on the newly generated <code>pandas.DataFrame</code>. Eventually, the subsampled dataframe's index can be reset. Here is how: <pre class="prettyprint"><code>ret = df.set_index('A').loc[list_of_values].reset_index(inplace=False) # ret is # A B # 0 3 3 # 1 4 5 # 2 6 2 </code></pre> Note that the drawback of this method is that the original indexing has been lost in the process. More on <code>pandas</code> indexing: What is the point of indexing in pandas?

Use <code>merge</code> with helper <code>DataFrame</code> created by list and with column name of matched column: <pre class="prettyprint"><code>df = pd.DataFrame({'A' : [5,6,3,4], 'B' : [1,2,3,5]}) list_of_values = [3,6,4] df1 = pd.DataFrame({'A':list_of_values}).merge(df) print (df1) A B 0 3 3 1 6 2 2 4 5 </code></pre> For more general solution: <pre class="prettyprint"><code>df = pd.DataFrame({'A' : [5,6,5,3,4,4,6,5], 'B':range(8)}) print (df) A B 0 5 0 1 6 1 2 5 2 3 3 3 4 4 4 5 4 5 6 6 6 7 5 7 list_of_values = [6,4,3,7,7,4] </code></pre> <hr> <pre class="prettyprint"><code>#create df from list list_df = pd.DataFrame({'A':list_of_values}) print (list_df) A 0 6 1 4 2 3 3 7 4 7 5 4 #column for original index values df1 = df.reset_index() #helper column for count duplicates values df1['g'] = df1.groupby('A').cumcount() list_df['g'] = list_df.groupby('A').cumcount() #merge together, create index from column and remove g column df = list_df.merge(df1).set_index('index').rename_axis(None).drop('g', axis=1) print (df) A B 1 6 1 4 4 4 3 3 3 5 4 5 </code></pre>

Select rows of pandas dataframe from list, in order of list

Tags:

python

pandas

dataframe

The question was originally asked here as a comment but could not get a proper answer as the question was marked as a duplicate.

For a given pandas.DataFrame, let us say

df = DataFrame({'A' : [5,6,3,4], 'B' : [1,2,3, 5]})
df

     A   B
0    5   1
1    6   2
2    3   3
3    4   5

How can we select rows from a list, based on values in a column ('A' for instance)

For instance

# from
list_of_values = [3,4,6]

# we would like, as a result
#      A   B
# 2    3   3
# 3    4   5
# 1    6   2

Using isin as mentioned here is not satisfactory as it does not keep order from the input list of 'A' values.

How can the abovementioned goal be achieved?

939

asked Aug 21 '18 07:08

syltruong

2 Answers

One way to overcome this is to make the 'A' column an index and use loc on the newly generated pandas.DataFrame. Eventually, the subsampled dataframe's index can be reset.

Here is how:

ret = df.set_index('A').loc[list_of_values].reset_index(inplace=False)

# ret is
#      A   B
# 0    3   3
# 1    4   5
# 2    6   2

Note that the drawback of this method is that the original indexing has been lost in the process.

More on pandas indexing: What is the point of indexing in pandas?

answered Oct 11 '22 20:10

syltruong

Use merge with helper DataFrame created by list and with column name of matched column:

df = pd.DataFrame({'A' : [5,6,3,4], 'B' : [1,2,3,5]})

list_of_values = [3,6,4]
df1 = pd.DataFrame({'A':list_of_values}).merge(df)
print (df1)
   A  B
0  3  3
1  6  2
2  4  5

For more general solution:

df = pd.DataFrame({'A' : [5,6,5,3,4,4,6,5], 'B':range(8)})
print (df)
   A  B
0  5  0
1  6  1
2  5  2
3  3  3
4  4  4
5  4  5
6  6  6
7  5  7

list_of_values = [6,4,3,7,7,4]

#create df from list 
list_df = pd.DataFrame({'A':list_of_values})
print (list_df)
   A
0  6
1  4
2  3
3  7
4  7
5  4

#column for original index values
df1 = df.reset_index()
#helper column for count duplicates values
df1['g'] = df1.groupby('A').cumcount()
list_df['g'] = list_df.groupby('A').cumcount()

#merge together, create index from column and remove g column
df = list_df.merge(df1).set_index('index').rename_axis(None).drop('g', axis=1)
print (df)
   A  B
1  6  1
4  4  4
3  3  3
5  4  5

answered Oct 11 '22 21:10

jezrael

Related questions
                            
                                File association not found for extension .py
                            
                                matplotlib toolbar in a pyqt5 application
                            
                                Running collectstatic on server : AttributeError: 'PosixPath' object has no attribute 'startswith'
                            
                                Can you stop PyCharm from automatically closing script files when you click out of the program?
                            
                                Pearson correlation and nan values
                            
                                Django max similarity (TrigramSimilarity) from ManyToManyField
                            
                                pandas plotting - x axis gets transformed to floats
                            
                                How does await give back control to the event loop during coroutine chaining?
                            
                                Python pandas: concat vertical and horizontal
                            
                                Manager / Container class, how to?
                            
                                Selenium with chromedriver doesn't start via cron
                            
                                Difference between setRootPath and setRootIndex in QFileSystemModel
                            
                                How can I attach documentation to members of a python enum?
                            
                                Shopify API Python Multiple Pictures upload with Python API
                            
                                python: How to trace function execution order in large project
                            
                                Is there an alternative to `difflib.get_close_matches()` that returns indexes (list positions) instead of a str list?
                            
                                Vectorized assignment in Numpy
                            
                                Strange behaviour of the loss function in keras model, with pretrained convolutional base
                            
                                round float values to interval limits / grid
                            
                                python multiprocessing - OverflowError('cannot serialize a bytes object larger than 4GiB')

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Select rows of pandas dataframe from list, in order of list

Tags:

python

pandas

dataframe

syltruong

People also ask

2 Answers

syltruong

jezrael

Recent Activity

Donate For Us