The question was originally asked here as a comment but could not get a proper answer as the question was marked as a duplicate.
For a given pandas.DataFrame
, let us say
df = DataFrame({'A' : [5,6,3,4], 'B' : [1,2,3, 5]})
df
A B
0 5 1
1 6 2
2 3 3
3 4 5
How can we select rows from a list, based on values in a column ('A'
for instance)
For instance
# from
list_of_values = [3,4,6]
# we would like, as a result
# A B
# 2 3 3
# 3 4 5
# 1 6 2
Using isin
as mentioned here is not satisfactory as it does not keep order from the input list of 'A'
values.
How can the abovementioned goal be achieved?
isin() to Select Rows From List of Values. DataFrame. isin() method is used to filter/select rows from a list of values. You can have the list of values in variable and use it on isin() or use it directly.
You can select rows from a list index using index. isin() Method which is used to check each element in the DataFrame is contained in values or not.
Use pandas. DataFrame. isin() to filter a DataFrame using a list.
You can select a single row from pandas DataFrame by integer index using df. iloc[n] . Replace n with a position you wanted to select.
One way to overcome this is to make the 'A'
column an index
and use loc
on the newly generated pandas.DataFrame
. Eventually, the subsampled dataframe's index can be reset.
Here is how:
ret = df.set_index('A').loc[list_of_values].reset_index(inplace=False)
# ret is
# A B
# 0 3 3
# 1 4 5
# 2 6 2
Note that the drawback of this method is that the original indexing has been lost in the process.
More on pandas
indexing: What is the point of indexing in pandas?
Use merge
with helper DataFrame
created by list and with column name of matched column:
df = pd.DataFrame({'A' : [5,6,3,4], 'B' : [1,2,3,5]})
list_of_values = [3,6,4]
df1 = pd.DataFrame({'A':list_of_values}).merge(df)
print (df1)
A B
0 3 3
1 6 2
2 4 5
For more general solution:
df = pd.DataFrame({'A' : [5,6,5,3,4,4,6,5], 'B':range(8)})
print (df)
A B
0 5 0
1 6 1
2 5 2
3 3 3
4 4 4
5 4 5
6 6 6
7 5 7
list_of_values = [6,4,3,7,7,4]
#create df from list
list_df = pd.DataFrame({'A':list_of_values})
print (list_df)
A
0 6
1 4
2 3
3 7
4 7
5 4
#column for original index values
df1 = df.reset_index()
#helper column for count duplicates values
df1['g'] = df1.groupby('A').cumcount()
list_df['g'] = list_df.groupby('A').cumcount()
#merge together, create index from column and remove g column
df = list_df.merge(df1).set_index('index').rename_axis(None).drop('g', axis=1)
print (df)
A B
1 6 1
4 4 4
3 3 3
5 4 5
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With