Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I select a specific column from each row in a Pandas DataFrame?

I have a DataFrame in this format:

    a   b   c
0   1   2   3
1   4   5   6
2   7   8   9
3   10  11  12
4   13  14  15

and an array like this, with column names:

['a', 'a', 'b', 'c', 'b']

and I’m hoping to extract an array of data, one value from each row. The array of column names specifies which column I want from each row. Here, the result would be:

[1, 4, 8, 12, 14]

Is this possible as a single command with Pandas, or do I need to iterate? I tried using indexing

i = pd.Index(['a', 'a', 'b', 'c', 'b'])
i.choose(df)

but I got a segfault, which I couldn’t diagnose because the documentation is lacking.

like image 735
gggritso Avatar asked Jul 18 '14 20:07

gggritso


People also ask

How do I select only certain columns in pandas?

To select a single column, use square brackets [] with the column name of the column of interest.

How do I pull a specific column from a DataFrame?

If you have a DataFrame and would like to access or select a specific few rows/columns from that DataFrame, you can use square brackets or other advanced methods such as loc and iloc .

How do I select a specific row and column in pandas?

Select Rows & Columns by Name or Index in Pandas DataFrame using [ ], loc & iloc. Indexing in Pandas means selecting rows and columns of data from a Dataframe.

How do you get a particular column value from a DataFrame in Python?

You can use the loc and iloc functions to access columns in a Pandas DataFrame. Let's see how. If we wanted to access a certain column in our DataFrame, for example the Grades column, we could simply use the loc function and specify the name of the column in order to retrieve it.


2 Answers

You could use lookup, e.g.

>>> i = pd.Series(['a', 'a', 'b', 'c', 'b'])
>>> df.lookup(i.index, i.values)
array([ 1,  4,  8, 12, 14])

where i.index could be different from range(len(i)) if you wanted.

like image 129
DSM Avatar answered Sep 30 '22 08:09

DSM


For large datasets, you can use indexing on the base numpy data, if you're prepared to transform your column names into a numerical index (simple in this case):

df.values[arange(5),[0,0,1,2,1]]

out: array([ 1,  4,  8, 12, 14])

This will be much more efficient that list comprehensions, or other explicit iterations.

like image 21
mdurant Avatar answered Sep 30 '22 06:09

mdurant