This is a follow-up question of the question I asked here. There I learned a) how to do this for columns (see below) and b) that the selection of rows and columns seems to be quite differently handled in R which means that I cannot use the same approach for rows.
So suppose I have a pandas dataframe like this:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randint(10, size=(6, 6)),
columns=['c' + str(i) for i in range(6)],
index=["r" + str(i) for i in range(6)])
c0 c1 c2 c3 c4 c5
r0 4 2 3 9 9 0
r1 9 0 8 1 7 5
r2 2 6 7 5 4 7
r3 6 9 9 1 3 4
r4 1 1 1 3 0 3
r5 0 8 5 8 2 9
then I can easily select rows and columns by their names like this:
print df.loc['r3':'r5', 'c1':'c4']
which returns
c1 c2 c3 c4
r3 9 9 1 3
r4 1 1 3 0
r5 8 5 8 2
How would I do this in R? Given a dataframe like this
df <- data.frame(c1=1:6, c2=2:7, c3=3:8, c4=4:9, c5=5:10, c6=6:11)
rownames(df) <- c('r1', 'r2', 'r3', 'r4', 'r5', 'r6')
c1 c2 c3 c4 c5 c6
r1 1 2 3 4 5 6
r2 2 3 4 5 6 7
r3 3 4 5 6 7 8
r4 4 5 6 7 8 9
r5 5 6 7 8 9 10
r6 6 7 8 9 10 11
Apparently, if I know the indexes of my desired rows/columns, I can simply do:
df[3:5, 1:4]
but I might delete rows/columns throughout my analysis so that I would rather select by name than by index. From the link above I learned that for columns the following would work:
subset(df, select=c1:c4)
which returns
c1 c2 c3 c4
r1 1 2 3 4
r2 2 3 4 5
r3 3 4 5 6
r4 4 5 6 7
r5 5 6 7 8
r6 6 7 8 9
but how could I also select a range of rows by name at the same time?
In this particular case I could of course use grep
but how about columns that have arbitrary names?
And I don't want to use
df[c('r3', 'r4' 'r5'), c('c1','c2', 'c3', 'c4')]
but an actual slice.
You can use which()
with rownames
:
subset(df[which(rownames(df)=='r3'):which(rownames(df)=='r5'),], select=c1:c4)
c1 c2 c3 c4
r3 3 4 5 6
r4 4 5 6 7
r5 5 6 7 8
You can write a function that will kinda give you the same behavior
'%:%' <- function(object, range) {
FUN <- if (!is.null(dim(object))) {
if (is.matrix(object)) colnames else names
} else identity
wh <- if (is.numeric(range)) range else which(FUN(object) %in% range)
FUN(object)[seq(wh[1], wh[2])]
}
df <- data.frame(c1=1:6, c2=2:7, c3=3:8, c4=4:9, c5=5:10, c6=6:11)
rownames(df) <- c('r1', 'r2', 'r3', 'r4', 'r5', 'r6')
Use it like
df %:% c('c2', 'c4')
# [1] "c2" "c3" "c4"
rownames(df) %:% c('r2', 'r4')
# [1] "r2" "r3" "r4"
For your question
df[rownames(df) %:% c('r3', 'r5'), df %:% c('c1', 'c5')]
# c1 c2 c3 c4 c5
# r3 3 4 5 6 7
# r4 4 5 6 7 8
# r5 5 6 7 8 9
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With