Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reduce a data frame to fewer rows

Let us say I have a data frame "dat" like:

 col1     col2
  12       a
  43       a
  54       a
  11       a
  33       b
  43       b
  34       c
  34       c
  342      c
  343      c

Now I have a vector as

vec <- c(a,a,a,b,c,c)

What I want to do is to remove extra rows in data frame "dat" as per vector "vec" which means in the data frame keep only first 3 rows corresponding to "a", keep only first 1 row corresponding to "b" and keep only first 2 rows corresponding to c.

I should get the output as

 col1     col2
  12       a
  43       a
  54       a
  33       b
  34       c
  34       c

What is the fastest way to do without having to use for loop?

like image 865
user3664020 Avatar asked Oct 22 '25 13:10

user3664020


1 Answers

This is a way using split and Map:

Data

dat <- read.table(header=T, text=' col1     col2
  12       a
  43       a
  54       a
  11       a
  33       b
  43       b
  34       c
  34       c
  342      c
  343      c',stringsAsFactors=F)

vec <-  c('a','a','a','b','c','c')

Solution

#count frequencies
tabvec <- table(vec)

data.frame(do.call(rbind,
   #use split to split data.frame according to col2
   #use head to only choose the first n rows according to tabvec
   #convert output into a data.frame
   Map(function(x,y) head(x,y),  split(dat, as.factor(dat$col2)), tabvec)
))

Output:

    col1 col2
a.1   12    a
a.2   43    a
a.3   54    a
b     33    b
c.7   34    c
c.8   34    c
like image 151
LyzandeR Avatar answered Oct 25 '25 02:10

LyzandeR



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!