I am attempting to generate a dataframe (or series) based on another dataframe, selecting a different column from the first frame dependent on the row using another series. In the below simplified example, I want the frame1 values from 'a' for the first three rows, and 'b for the final two (the picked_values series). <pre class="prettyprint"><code>frame1=pd.DataFrame(np.random.randn(10).reshape(5,2),index=range(5),columns=['a','b']) picked_values=pd.Series(['a','a','a','b','b']) </code></pre> Frame1 <pre class="prettyprint"><code> a b 0 0.283519 1.462209 1 -0.352342 1.254098 2 0.731701 0.236017 3 0.022217 -1.469342 4 0.386000 -0.706614 </code></pre> Trying to get to the series: <pre class="prettyprint"><code>0 0.283519 1 -0.352342 2 0.731701 3 -1.469342 4 -0.706614 </code></pre> I was hoping <code>values[picked_values]</code> would work, but this ends up with five columns. In the real-life example, picked_values is a lot larger and calculated. Thank you for your time.

Use <code>df.lookup</code> <pre class="prettyprint"><code>pd.Series(frame1.lookup(picked_values.index,picked_values)) </code></pre> <hr> <pre class="prettyprint"><code>0 0.283519 1 -0.352342 2 0.731701 3 -1.469342 4 -0.706614 dtype: float64 </code></pre>

Here's a <code>NumPy</code> based approach using <code>integer indexing</code> and <code>Series.searchsorted</code>: <pre class="prettyprint"><code>frame1.values[frame1.index, frame1.columns.searchsorted(picked_values.values)] # array([0.22095278, 0.86200616, 1.88047197, 0.49816937, 0.10962954]) </code></pre>

Select columns in a DataFrame conditional on row

Tags:

python

pandas

dataframe

I am attempting to generate a dataframe (or series) based on another dataframe, selecting a different column from the first frame dependent on the row using another series. In the below simplified example, I want the frame1 values from 'a' for the first three rows, and 'b for the final two (the picked_values series).

frame1=pd.DataFrame(np.random.randn(10).reshape(5,2),index=range(5),columns=['a','b'])
picked_values=pd.Series(['a','a','a','b','b'])

Frame1

    a           b
0   0.283519    1.462209
1   -0.352342   1.254098
2   0.731701    0.236017
3   0.022217    -1.469342
4   0.386000    -0.706614

Trying to get to the series:

0   0.283519
1   -0.352342
2   0.731701
3   -1.469342
4   -0.706614

I was hoping values[picked_values] would work, but this ends up with five columns.

In the real-life example, picked_values is a lot larger and calculated.

Thank you for your time.

660

asked Jan 24 '20 14:01

TheSuperbard

2 Answers

Use df.lookup

pd.Series(frame1.lookup(picked_values.index,picked_values))

0    0.283519
1   -0.352342
2    0.731701
3   -1.469342
4   -0.706614
dtype: float64

answered Sep 19 '22 01:09

anky

Here's a NumPy based approach using integer indexing and Series.searchsorted:

frame1.values[frame1.index, frame1.columns.searchsorted(picked_values.values)]
# array([0.22095278, 0.86200616, 1.88047197, 0.49816937, 0.10962954])

answered Sep 18 '22 01:09

yatu

Related questions
                            
                                How to count no of rows in a data frame whose values divisible by 3 or 5?
                            
                                How to animate a line chart in a streamlit page
                            
                                How to popup success message in odoo?
                            
                                SQLAlchemy: Can't reconnect until invalid transaction is rolled back
                            
                                What is causing large jumps in training accuracy and loss between epochs?
                            
                                rllib use custom registered environments
                            
                                Is it possible to extract text from specific portion of image using pytesseract
                            
                                Is it possible to convert a really large int to a string quickly in python
                            
                                How to visualize RNN/LSTM gradients in Keras/TensorFlow?
                            
                                cqlsh ERROR:root:code for hash md5 was not found
                            
                                How to Refactor Module using python rope?
                            
                                More elegant way of find a range of repeating elements
                            
                                AttributeError: module 'tensorflow' has no attribute 'get_variable'
                            
                                Why can't I use pip with Python 3.8? [duplicate]
                            
                                RuntimeError at / cannot cache function '__shear_dense': no locator available for file '/home/...site-packages/librosa/util/utils.py'
                            
                                Tensorflow: How to use tf.keras.metrics in multiclass classification?
                            
                                How to remove extra whitespace from image in opencv? [duplicate]
                            
                                How do you search a column and fill another column with what you find?
                            
                                Is the __init__.py really not necessary for python 3.7 packages?
                            
                                How to read simple text from a PDF file with Python?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With