Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Adding rows in `dplyr` output

In traditional plyr, returned rows are added automagically to the output even if they exceed the number of input rows for that grouping:

set.seed(1)
dat <- data.frame(x=runif(10),g=rep(letters[1:5],each=2))
> ddply( dat, .(g), function(df) df[c(1,1,1,2),] )
            x g
1  0.26550866 a
2  0.26550866 a
3  0.26550866 a
4  0.37212390 a
5  0.57285336 b
6  0.57285336 b
7  0.57285336 b
8  0.90820779 b
9  0.20168193 c
10 0.20168193 c
11 0.20168193 c
12 0.89838968 c
13 0.94467527 d
14 0.94467527 d
15 0.94467527 d
16 0.66079779 d
17 0.62911404 e
18 0.62911404 e
19 0.62911404 e
20 0.06178627 e

I cannot figure out how to do the same in dplyr. Some attempts:

dat %>% group_by(g) %>% summarise( xbar = mean(x) )

> dat %>% group_by(g) %>% summarise( xbar = runif(3) )
Error: expecting a single value

# Getting creative...

> dat %>% group_by(g) %>% function(x) x[c(1,1,1,2),]

# Nope.

How do I do this?

The specific use case I'm butting up against is splitting a \n-delimited text field and making it "long," but I use this feature of ddply all the time for many purposes.

like image 664
Ari B. Friedman Avatar asked May 13 '14 01:05

Ari B. Friedman


People also ask

How do I add a new row in dplyr?

rows_insert() adds new rows (like INSERT ). By default, key values in y must not exist in x .

How do I add rows to a DataFrame in R?

Using nrow() This syntax literally means that we calculate the number of rows in the DataFrame ( nrow(dataframe) ), add 1 to this number ( nrow(dataframe) + 1 ), and then append a new row new_row at that index of the DataFrame ( dataframe[nrow(dataframe) + 1,] ) — i.e., as a new last row.

How do I sum specific rows in R?

The rowSums() function in R can be used to calculate the sum of the values in each row of a matrix or data frame in R.

What does %>% do in dplyr?

%>% is called the forward pipe operator in R. It provides a mechanism for chaining commands with a new forward-pipe operator, %>%. This operator will forward a value, or the result of an expression, into the next function call/expression. It is defined by the package magrittr (CRAN) and is heavily used by dplyr (CRAN).


1 Answers

Try this:

 dat %>% 
     group_by( g ) %>% 
     do( .[c(1,1,1,2), ] ) %>% 
     ungroup()
like image 193
G. Grothendieck Avatar answered Oct 10 '22 16:10

G. Grothendieck