Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Select specific index, column pairs from pandas dataframe

Tags:

python

pandas

I have a dataframe x:

x = pd.DataFrame(np.random.randn(3,3), index=[1,2,3], columns=['A', 'B', 'C'])
x


       A    B   C
1   0.256668    -0.338741   0.733561
2   0.200978    0.145738    -0.409657
3   -0.891879   0.039337    0.400449

and I would like to select a bunch of index column pairs to populate a new Series. For example, I could select [(1, 'A'), (1, 'B'), (1, 'A'), (3, 'C')] which would generate a list or array or series with 4 elements:

[0.256668, -0.338741, 0.256668, 0.400449]

Any idea of how I should do that?

like image 799
dylkot Avatar asked Mar 02 '15 21:03

dylkot


People also ask

How do you select a specific index in a DataFrame?

So, if you want to select the 5th row in a DataFrame, you would use df. iloc[[4]] since the first row is at index 0, the second row is at index 1, and so on. . loc selects rows based on a labeled index.

How do I select only certain columns in pandas?

To select a single column, use square brackets [] with the column name of the column of interest.

How do I extract unique values from a column in pandas?

You can get unique values in column (multiple columns) from pandas DataFrame using unique() or Series. unique() functions. unique() from Series is used to get unique values from a single column and the other one is used to get from multiple columns.

How to select rows by Index in a pandas Dataframe?

How to Select Rows by Index in a Pandas DataFrame 1 Example 1: Select Rows Based on Integer Indexing. 2 Example 2: Select Rows Based on Label Indexing. 3 The Difference Between .iloc and .loc. So, if you want to select the 5th row in a DataFrame, you would use df.iloc [... 4 Additional Resources. More ...

How do I select a column in a Dataframe based on index?

Often you may want to select the columns of a pandas DataFrame based on their index value. If you’d like to select columns based on integer indexing, you can use the .iloc function. If you’d like to select columns based on label indexing, you can use the .loc function.

What is a pandas Dataframe?

Check how cool is the tool A Pandas DataFrame is a structure that represents data in a tabular format. It contains columns and rows, with each column representing a different data type. You can select specific columns from a DataFrame using the column name.

How to index Dataframe in Python?

Dataframe.loc [ ] : This function is used for labels. Collectively, they are called the indexers. These are by far the most common ways to index data. These are four function which help in getting the elements, rows, and columns from a DataFrame. Indexing operator is used to refer to the square brackets following an object.


2 Answers

I think get_value() and lookup() is faster:

import numpy as np
import pandas as pd
x = pd.DataFrame(np.random.randn(3,3), index=[1,2,3], columns=['A', 'B', 'C'])

locations = [(1, "A"), (1, "B"), (1, "A"), (3, "C")]

print x.get_value(1, "A")

row_labels, col_labels = zip(*locations)
print x.lookup(row_labels, col_labels)
like image 139
HYRY Avatar answered Oct 21 '22 10:10

HYRY


If your pairs are positions instead of index/column names,

row_position = [0,0,0,2]
col_position = [0,1,0,2]

x.values[row_position, col_position]

Or get the position from np.searchsorted

row_position = np.searchsorted(x.index,row_labels,sorter = np.argsort(x.index))
like image 45
user3226167 Avatar answered Oct 21 '22 12:10

user3226167