Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to select range of columns in a dataframe based on their name and not their indexes?

In a pandas dataframe created like this:

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randint(10, size=(6, 6)),
                  columns=['c' + str(i) for i in range(6)],
                  index=["r" + str(i) for i in range(6)])

which could look as follows:

    c0  c1  c2  c3  c4  c5
r0   2   7   3   3   2   8
r1   6   9   6   7   9   1
r2   4   0   9   8   4   2
r3   9   0   4   3   5   4
r4   7   6   8   8   0   8
r5   0   6   1   8   2   2

I can easily select certain rows and/or a range of columns using .loc:

print df.loc[['r1', 'r5'], 'c1':'c4']

That would return:

    c1  c2  c3  c4
r1   9   6   7   9
r5   6   1   8   2

So, particular rows/columns I can select in a list, a range of rows/columns using a colon.

How would one do this in R? Here and here one always has to specify the desired range of columns by their index but one cannot - or at least I did not find it - access those by name. To give an example:

df <- data.frame(c1=1:6, c2=2:7, c3=3:8, c4=4:9, c5=5:10, c6=6:11)
rownames(df) <- c('r1', 'r2', 'r3', 'r4', 'r5', 'r6')

The command

df[c('r1', 'r5'),'c1':'c4']

does not work and throws an error. The only thing that worked for me is

df[c('r1', 'r5'), 1:4]

which returns

   c1 c2 c3 c4
r1  1  2  3  4
r5  5  6  7  8

But how would I select the columns by their name and not by their index (which might be important when I drop certain columns throughout the analysis)? In this particular case I could of course use grep but how about columns that have arbitrary names?

So I don't want to use

df[c('r1', 'r5'),c('c1','c2', 'c3', 'c4')]

but an actual slice.

EDIT:

A follow-up question can be found here.

like image 506
Cleb Avatar asked Jun 08 '16 22:06

Cleb


2 Answers

It looks like you can accomplish this with a subset:

> df <- data.frame(c1=1:6, c2=2:7, c3=3:8, c4=4:9, c5=5:10, c6=6:11)
> rownames(df) <- c('r1', 'r2', 'r3', 'r4', 'r5', 'r6')
> subset(df, select=c1:c4)
   c1 c2 c3 c4
r1  1  2  3  4
r2  2  3  4  5
r3  3  4  5  6
r4  4  5  6  7
r5  5  6  7  8
r6  6  7  8  9
> subset(df, select=c1:c2)
   c1 c2
r1  1  2
r2  2  3
r3  3  4
r4  4  5
r5  5  6
r6  6  7

If you want to subset by row name range, this hack would do:

> gRI <- function(df, rName) {which(match(rNames, rName) == 1)}
> df[gRI(df,"r2"):gRI(df,"r4"),]
   c1 c2 c3 c4 c5 c6
r2  2  3  4  5  6  7
r3  3  4  5  6  7  8
r4  4  5  6  7  8  9
like image 58
evan.oman Avatar answered Nov 03 '22 00:11

evan.oman


An alternative approach to subset if you don't mind to work with data.table would be:

data.table::setDT(df)
df[1:3, c2:c4, with=F]
   c2 c3 c4
1:  2  3  4
2:  3  4  5
3:  4  5  6

This still does not solve the problem of subsetting row range though.

like image 26
Psidom Avatar answered Nov 02 '22 23:11

Psidom