I have a data set that looks like this:
structure(list(A = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = c("1",
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13",
"14", "15", "16", "17", "18", "19", "20", "21", "22", "23", "24",
"25"), class = "factor"), T = c(0.04, 0.08, 0.12, 0.16, 0.2,
0.24), X = c(464.4, 464.4, 464.4, 464.4, 464.4, 464.4), Y = c(418.5,
418.5, 418.5, 418.5, 418.5, 418.5), V = c(0, 0, 0, 0, 0, 0),
GD = c(0, 0, 0, 0, 0, 0), ND = c(NA, 0, 0, 0, 0, 0), ND2 = c(NA,
0, 0, 0, 0, 0), TID = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = c("t1",
"t10", "t11", "t12", "t13", "t14", "t15", "t16", "t17", "t18",
"t19", "t2", "t20", "t21", "t22", "t23", "t24", "t25", "t3",
"t4", "t5", "t6", "t7", "t8", "t9"), class = "factor")), .Names = c("A",
"T", "X", "Y", "V", "GD", "ND", "ND2", "TID"), row.names = c(NA,
6L), class = "data.frame")
I want to select the first 80 observations of all variables for each TID. So far, I can do this with the first TID only using the code:
sub.data1<-NM[1:80, ]
How can I do it for all my other TIDs?
Thanks!
I would do:
lapply(split(dat, dat$TID), head, 80)
It returns a list of data.frames with 80 (or less) rows. If instead you want everything into one data.frame:
do.call(rbind, lapply(split(dat, dat$TID), head, 80))
Using function ddply()
from plyr
you can split data by TID and then select forst 80 with head()
and then put all again in one data frame,
library(plyr)
ddply(NM, .(TID), head, n = 80)
Using data tables, I made a shorter example with just TIDs t1 and t2 that returns the first 2 rows of t1 and t2. It can be adjusted for your data.
library(data.table)
data<-structure(list(A = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = c("1",
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13",
"14", "15", "16", "17", "18", "19", "20", "21", "22", "23", "24",
"25"), class = "factor"), T = c(0.04, 0.08, 0.12, 0.16, 0.2,
0.24), X = c(464.4, 464.4, 464.4, 464.4, 464.4, 464.4), Y = c(418.5,
418.5, 418.5, 418.5, 418.5, 418.5), V = c(0, 0, 0, 0, 0, 0),
GD = c(0, 0, 0, 0, 0, 0), ND = c(NA, 0, 0, 0, 0, 0), ND2 = c(NA,
0, 0, 0, 0, 0), TID = c("t1","t1","t1","t2","t2","t2")), .Names = c("A",
"T", "X", "Y", "V", "GD", "ND", "ND2", "TID"), row.names = c(NA,
6L), class = "data.frame")
dt<-data.table(data)
dt[,head(.SD,2),by=TID]
This results in:
TID A T X Y V GD ND ND2
1: t1 1 0.04 464.4 418.5 0 0 NA NA
2: t1 1 0.08 464.4 418.5 0 0 0 0
3: t2 1 0.16 464.4 418.5 0 0 0 0
4: t2 1 0.20 464.4 418.5 0 0 0 0
and can be changed back to a data frame if desired by changing the last line to
as.data.frame(dt[,head(.SD,2),by=TID])
Here is another solution in base:
do.call(rbind, by(NM, NM$TID, head, 80))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With