I have a data frame as follows:
Category Name Value
How would I select say, 5 random names per category? Using sample returns random rows using all rows as possible candidates. However, I want to specify the number of random rows per category. Any suggestions?
Update: I am open to using ddply
Best guess in absence of test cases:
do.call( rbind, lapply( split(dfrm, df$cat) ,
function(df) df[sample(nrow(df), 5) , ] )
)
Tested with Jonathan's data:
> do.call( rbind, lapply( split(df, df$Category) ,
+ function(df) df[sample(nrow(df), 5) , ] )
+ )
Category Name Value
1.8 1 8 -0.2496109 # useful side-effect of labeling source group
1.15 1 15 -0.4037368
1.17 1 17 -0.4223724
1.12 1 12 -0.9359026
1.18 1 18 0.3741184
2.37 2 37 0.3033610
2.34 2 34 -0.4517738
2.36 2 36 -0.7695923
snipped remainder
If you want the same number of items from each category, this is easy:
df[unlist(tapply(1:nrow(df),df$Category,function(x) sample(x,3))),]
e.g., I generated df as follows:
df <- data.frame(Category=rep(1:5,each=20),Name=1:100,Value=rnorm(100))
then I get the follow from my code:
> df[unlist(tapply(1:nrow(df),df$Category,function(x) sample(x,3))),]
Category Name Value
5 1 5 0.25151044
20 1 20 1.52486482
18 1 18 0.69313462
30 2 30 0.73444185
27 2 27 0.24000427
39 2 39 -0.10108203
46 3 46 -0.37200574
49 3 49 -1.84920469
43 3 43 0.35976388
68 4 68 0.57879516
76 4 76 -0.11049302
64 4 64 -0.13471303
100 5 100 0.95979408
95 5 95 -0.01928741
99 5 99 0.85725242
If you want different numbers of rows from each category it will be more complicated.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With