Let us say I have a data frame "dat" like:
col1 col2
12 a
43 a
54 a
11 a
33 b
43 b
34 c
34 c
342 c
343 c
Now I have a vector as
vec <- c(a,a,a,b,c,c)
What I want to do is to remove extra rows in data frame "dat" as per vector "vec" which means in the data frame keep only first 3 rows corresponding to "a", keep only first 1 row corresponding to "b" and keep only first 2 rows corresponding to c.
I should get the output as
col1 col2
12 a
43 a
54 a
33 b
34 c
34 c
What is the fastest way to do without having to use for loop?
This is a way using split and Map:
Data
dat <- read.table(header=T, text=' col1 col2
12 a
43 a
54 a
11 a
33 b
43 b
34 c
34 c
342 c
343 c',stringsAsFactors=F)
vec <- c('a','a','a','b','c','c')
Solution
#count frequencies
tabvec <- table(vec)
data.frame(do.call(rbind,
#use split to split data.frame according to col2
#use head to only choose the first n rows according to tabvec
#convert output into a data.frame
Map(function(x,y) head(x,y), split(dat, as.factor(dat$col2)), tabvec)
))
Output:
col1 col2
a.1 12 a
a.2 43 a
a.3 54 a
b 33 b
c.7 34 c
c.8 34 c
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With