How can I select a specific column from each row in a Pandas DataFrame?

Tags:

I have a DataFrame in this format:

    a   b   c
0   1   2   3
1   4   5   6
2   7   8   9
3   10  11  12
4   13  14  15

and an array like this, with column names:

['a', 'a', 'b', 'c', 'b']

and I’m hoping to extract an array of data, one value from each row. The array of column names specifies which column I want from each row. Here, the result would be:

[1, 4, 8, 12, 14]

Is this possible as a single command with Pandas, or do I need to iterate? I tried using indexing

i = pd.Index(['a', 'a', 'b', 'c', 'b'])
i.choose(df)

but I got a segfault, which I couldn’t diagnose because the documentation is lacking.

735

asked Jul 18 '14 20:07

gggritso

2 Answers

You could use lookup, e.g.

>>> i = pd.Series(['a', 'a', 'b', 'c', 'b'])
>>> df.lookup(i.index, i.values)
array([ 1,  4,  8, 12, 14])

where i.index could be different from range(len(i)) if you wanted.

129

answered Sep 30 '22 08:09

DSM

For large datasets, you can use indexing on the base numpy data, if you're prepared to transform your column names into a numerical index (simple in this case):

df.values[arange(5),[0,0,1,2,1]]

out: array([ 1,  4,  8, 12, 14])

This will be much more efficient that list comprehensions, or other explicit iterations.

answered Sep 30 '22 06:09

mdurant

Related questions
                            
                                python logger logging same entry numerous times
                            
                                What is a good place to store configuration in Google AppEngine (python)
                            
                                Checking if an ISBN number is correct
                            
                                Sending Meeting Invitations With Python
                            
                                testing for empty/null string in django
                            
                                How to change the dtype of certain columns of a numpy recarray?
                            
                                What is the advantage of using the native C++ Qt over PyQt [closed]
                            
                                Build query string using urlencode python
                            
                                SQL Alchemy ResultProxy.rowcount should not be zero
                            
                                Nicing a running python process?
                            
                                BeautifulSoup in Python - getting the n-th tag of a type
                            
                                line 60, in make_tuple return tuple(l) TypeError: iter() returned non-iterator of type 'Vector'
                            
                                How to check if the n-th element exists in a Python list?
                            
                                Adding records to a numpy record array
                            
                                python .count for multidimensional arrays (list of lists)
                            
                                Concurrent writing with sqlite3 [duplicate]
                            
                                How to write data from two lists into columns in a csv?
                            
                                KeyError when using .format on a string in Python [duplicate]
                            
                                Shuffle a numpy array
                            
                                Decrypting Chromium cookies

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How can I select a specific column from each row in a Pandas DataFrame?

Tags:

python

pandas

numpy

gggritso

People also ask

2 Answers

DSM

mdurant

Recent Activity

Donate For Us