Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it possible to select pandas dataframe with row indices and column names?

Tags:

python

pandas

For datasets with meaningless row indices, I found it more useful to select data by row numbers but at the same time using column names. I know .iloc only takes in row/column numbers (integers) and .loc only takes in names. But is there a workaround to do a combination of row number and column name at the same time?

    A   B
1   1   a
5   2   a
6   3   a
4   4   b
9   5   b
3   6   b

For example, I would like to select the entry at row 2 and column B - I do not necessarily know that row 2 has row name 5 nor column B is the second column. What's the best way to reference that cell then?

(The row names are usually a filtered result or a random sample of a bigger dataset)

like image 895
hurrikale Avatar asked Mar 16 '16 16:03

hurrikale


People also ask

How do you select a DataFrame based on an index?

iloc selects rows based on an integer index. So, if you want to select the 5th row in a DataFrame, you would use df. iloc[[4]] since the first row is at index 0, the second row is at index 1, and so on.

Which method is used to access the rows or columns of a DataFrame at particular positions in the index?

Selecting values from particular rows and columns in a dataframe is known as Indexing. By using Indexing, we can select all rows and some columns or some rows and all columns.

How do I select specific rows in pandas based on index?

Use pandas. DataFrame. iloc[] to Select Rows by Integer Index. pandas iloc[] operator is an index-based to select DataFrame rows.

How do I extract specific rows and columns from a DataFrame in python?

Selecting rows and columns from a pandas Dataframe If we know which columns we want before we read the data from the file we can tell read_csv() to only import those columns by specifying columns either by their index number (starting at 0) as a list to the usecols parameter.


1 Answers

You can use faster iat as iloc:

print df
    A  B
1   1  a
5   2  a
6   3  c
8   4  b
9   5  b
10  6  b

print df['B'].iat[2]
c

print df['B'].iloc[2]
c

Timing:

In [266]: %timeit df['B'].iat[2]
The slowest run took 31.55 times longer than the fastest. This could mean that an intermediate result is being cached 
100000 loops, best of 3: 7.28 µs per loop

In [267]: %timeit df['B'].iloc[2]
The slowest run took 24.47 times longer than the fastest. This could mean that an intermediate result is being cached 
100000 loops, best of 3: 11.5 µs per loop
like image 97
jezrael Avatar answered Oct 05 '22 20:10

jezrael