I am attempting to generate a dataframe (or series) based on another dataframe, selecting a different column from the first frame dependent on the row using another series. In the below simplified example, I want the frame1 values from 'a' for the first three rows, and 'b for the final two (the picked_values series).
frame1=pd.DataFrame(np.random.randn(10).reshape(5,2),index=range(5),columns=['a','b'])
picked_values=pd.Series(['a','a','a','b','b'])
Frame1
a b
0 0.283519 1.462209
1 -0.352342 1.254098
2 0.731701 0.236017
3 0.022217 -1.469342
4 0.386000 -0.706614
Trying to get to the series:
0 0.283519
1 -0.352342
2 0.731701
3 -1.469342
4 -0.706614
I was hoping values[picked_values]
would work, but this ends up with five columns.
In the real-life example, picked_values is a lot larger and calculated.
Thank you for your time.
You can use the filter function of the pandas dataframe to select columns containing a specified string in column names. The parameter like of the . filter function defines this specific string. If a column name contains the string specified, that column will be selected and dataframe will be returned.
isin() to Select Rows From List of Values. DataFrame. isin() method is used to filter/select rows from a list of values. You can have the list of values in variable and use it on isin() or use it directly.
Use df.lookup
pd.Series(frame1.lookup(picked_values.index,picked_values))
0 0.283519
1 -0.352342
2 0.731701
3 -1.469342
4 -0.706614
dtype: float64
Here's a NumPy
based approach using integer indexing
and Series.searchsorted
:
frame1.values[frame1.index, frame1.columns.searchsorted(picked_values.values)]
# array([0.22095278, 0.86200616, 1.88047197, 0.49816937, 0.10962954])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With