I have a DataFrame in this format:
a b c
0 1 2 3
1 4 5 6
2 7 8 9
3 10 11 12
4 13 14 15
and an array like this, with column names:
['a', 'a', 'b', 'c', 'b']
and I’m hoping to extract an array of data, one value from each row. The array of column names specifies which column I want from each row. Here, the result would be:
[1, 4, 8, 12, 14]
Is this possible as a single command with Pandas, or do I need to iterate? I tried using indexing
i = pd.Index(['a', 'a', 'b', 'c', 'b'])
i.choose(df)
but I got a segfault, which I couldn’t diagnose because the documentation is lacking.
To select a single column, use square brackets [] with the column name of the column of interest.
If you have a DataFrame and would like to access or select a specific few rows/columns from that DataFrame, you can use square brackets or other advanced methods such as loc and iloc .
Select Rows & Columns by Name or Index in Pandas DataFrame using [ ], loc & iloc. Indexing in Pandas means selecting rows and columns of data from a Dataframe.
You can use the loc and iloc functions to access columns in a Pandas DataFrame. Let's see how. If we wanted to access a certain column in our DataFrame, for example the Grades column, we could simply use the loc function and specify the name of the column in order to retrieve it.
You could use lookup
, e.g.
>>> i = pd.Series(['a', 'a', 'b', 'c', 'b'])
>>> df.lookup(i.index, i.values)
array([ 1, 4, 8, 12, 14])
where i.index
could be different from range(len(i))
if you wanted.
For large datasets, you can use indexing on the base numpy data, if you're prepared to transform your column names into a numerical index (simple in this case):
df.values[arange(5),[0,0,1,2,1]]
out: array([ 1, 4, 8, 12, 14])
This will be much more efficient that list comprehensions, or other explicit iterations.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With