Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Select rows of pandas dataframe from list, in order of list

The question was originally asked here as a comment but could not get a proper answer as the question was marked as a duplicate.

For a given pandas.DataFrame, let us say

df = DataFrame({'A' : [5,6,3,4], 'B' : [1,2,3, 5]})
df

     A   B
0    5   1
1    6   2
2    3   3
3    4   5

How can we select rows from a list, based on values in a column ('A' for instance)

For instance

# from
list_of_values = [3,4,6]

# we would like, as a result
#      A   B
# 2    3   3
# 3    4   5
# 1    6   2

Using isin as mentioned here is not satisfactory as it does not keep order from the input list of 'A' values.

How can the abovementioned goal be achieved?

like image 939
syltruong Avatar asked Aug 21 '18 07:08

syltruong


People also ask

How do I select rows of pandas DataFrame based on a list?

isin() to Select Rows From List of Values. DataFrame. isin() method is used to filter/select rows from a list of values. You can have the list of values in variable and use it on isin() or use it directly.

How do you select rows based on index list?

You can select rows from a list index using index. isin() Method which is used to check each element in the DataFrame is contained in values or not.

How do you filter a DataFrame based on a list?

Use pandas. DataFrame. isin() to filter a DataFrame using a list.

How do you select rows from a DataFrame based on an index?

You can select a single row from pandas DataFrame by integer index using df. iloc[n] . Replace n with a position you wanted to select.


2 Answers

One way to overcome this is to make the 'A' column an index and use loc on the newly generated pandas.DataFrame. Eventually, the subsampled dataframe's index can be reset.

Here is how:

ret = df.set_index('A').loc[list_of_values].reset_index(inplace=False)

# ret is
#      A   B
# 0    3   3
# 1    4   5
# 2    6   2 

Note that the drawback of this method is that the original indexing has been lost in the process.

More on pandas indexing: What is the point of indexing in pandas?

like image 75
syltruong Avatar answered Oct 11 '22 20:10

syltruong


Use merge with helper DataFrame created by list and with column name of matched column:

df = pd.DataFrame({'A' : [5,6,3,4], 'B' : [1,2,3,5]})

list_of_values = [3,6,4]
df1 = pd.DataFrame({'A':list_of_values}).merge(df)
print (df1)
   A  B
0  3  3
1  6  2
2  4  5

For more general solution:

df = pd.DataFrame({'A' : [5,6,5,3,4,4,6,5], 'B':range(8)})
print (df)
   A  B
0  5  0
1  6  1
2  5  2
3  3  3
4  4  4
5  4  5
6  6  6
7  5  7

list_of_values = [6,4,3,7,7,4]

#create df from list 
list_df = pd.DataFrame({'A':list_of_values})
print (list_df)
   A
0  6
1  4
2  3
3  7
4  7
5  4

#column for original index values
df1 = df.reset_index()
#helper column for count duplicates values
df1['g'] = df1.groupby('A').cumcount()
list_df['g'] = list_df.groupby('A').cumcount()

#merge together, create index from column and remove g column
df = list_df.merge(df1).set_index('index').rename_axis(None).drop('g', axis=1)
print (df)
   A  B
1  6  1
4  4  4
3  3  3
5  4  5
like image 21
jezrael Avatar answered Oct 11 '22 21:10

jezrael