Select row by level of a factor

Question

I have a data frame, df2, containing observations grouped by a ID factor that I would like to subset. I have used another function to identify which rows within each factor group that I want to select. This is shown below in df:

df <- data.frame(ID = c("A","B","C"),
                 pos = c(1,3,2))
df2 <- data.frame(ID = c(rep("A",5), rep("B",5), rep("C",5)),
                  obs = c(1:15))

In df, pos corresponds to the index of the row that I want to select within the factor level mentioned in ID, not in the whole dataframe df2.I'm looking for a way to select the rows for each ID according to the right index (so their row number within the level of each factor of df2).

So, in this example, I want to select the first value in df2 with ID == 'A', the third value in df2 with ID == 'B' and the second value in df2 with ID == 'C'.

This would then give me:

df3 <- data.frame(ID = c("A", "B", "C"),
                  obs = c(1, 8, 12))

Frank · Accepted Answer

dplyr

library(dplyr)

merge(df,df2) %>% 
  group_by(ID) %>% 
  filter(row_number() == pos) %>%
  select(-pos)

#   ID obs
# 1  A   1
# 2  B   8
# 3  C  12

base R

df2m <- merge(df,df2)
do.call(rbind, 
  by(df2m, df2m$ID, function(SD) SD[SD$pos[1], setdiff(names(SD),"pos")]) 
)

by splits the merged data frame df2m by df2m$ID and operates on each part; it returns results in a list, so they must be rbinded together at the end. Each subset of the data (associated with each value of ID) is filtered by pos and deselects the "pos" column using normal data.frame syntax.

data.table suggested by @DavidArenburg in a comment

library(data.table)

setkey(setDT(df2),"ID")[df][, 
  .SD[pos[1L], !"pos", with=FALSE]
, by = ID]

The first part -- setkey(setDT(df2),"ID")[df] -- is the merge. After that, the resulting table is split by = ID, and each Subset of Data, .SD is operated on. pos[1L] is subsetting in the normal way, while !"pos", with=FALSE corresponds to dropping the pos column.

See @eddi's answer for a better data.table approach.

user295691 · Answer

Here's the base R solution:

df2$pos <- ave(df2$obs, df2$ID, FUN=seq_along)
merge(df, df2)
  ID pos obs
1  A   1   1
2  B   3   8
3  C   2  12

If df2 is sorted by ID, you can just do df2$pos <- sequence(table(df2$ID)) for the first line.

eddi · Answer

Using data.table version 1.9.5+:

setDT(df2)[df, .SD[pos], by = .EACHI, on = 'ID']

which merges on ID column, then selects the pos row for each of the rows of df.

Select row by level of a factor

Tags:

r

r-factor

subset

Chris. Z

3 Answers

Frank

user295691

eddi

Recent Activity

Donate For Us

Select row by level of a factor

Tags:

r

r-factor

subset

Chris. Z

3 Answers

Frank

user295691

eddi

Related questions

Recent Activity

Donate For Us