Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

filtering data.frame based on row_number()

Tags:

r

dplyr

UPDATE: dplyr has been updated since this question was asked and now performs as the OP wanted

I´m trying to get the second to the seventh line in a data.frame using dplyr.

I´m doing this:

require(dplyr) df <- data.frame(id = 1:10, var = runif(10)) df <- df %>% filter(row_number() <= 7, row_number() >= 2) 

But this throws an error.

Error in rank(x, ties.method = "first") :    argument "x" is missing, with no default 

I know i could easily make:

df <- df %>% mutate(rn = row_number()) %>% filter(rn <= 7, rn >= 2) 

But I would like to understand why my first try is not working.

like image 329
Daniel Falbel Avatar asked Sep 23 '14 11:09

Daniel Falbel


People also ask

How do I find the number of rows in a Dataframe in R?

To get number of rows in R Data Frame, call the nrow() function and pass the data frame as argument to this function. nrow() is a function in R base package.

How do I filter not in R?

How to Use “not in” operator in Filter, To filter for rows in a data frame that is not in a list of values, use the following basic syntax in dplyr. df %>% filter(! col_name %in% c('value1', 'value2', 'value3', ...)) df %>% filter(!

What does row_number do in R?

ROW_NUMBER is an analytic function. It assigns a unique number to each row to which it is applied (either each row in the partition or each row returned by the query), in the ordered sequence of rows specified in the order_by_clause , beginning with 1.


2 Answers

Actually dplyr's slice function is made for this kind of subsetting:

df %>% slice(2:7) 

(I'm a little late to the party but thought I'd add this for future readers)

like image 160
talat Avatar answered Oct 22 '22 06:10

talat


The row_number() function does not simply return the row number of each element and so can't be used like you want:

• ‘row_number’: equivalent to ‘rank(ties.method = "first")’

You're not actually saying what you want the row_number of. In your case:

df %>% filter(row_number(id) <= 7, row_number(id) >= 2) 

works because id is sorted and so row_number(id) is 1:10. I don't know what row_number() evaluates to in this context, but when called a second time dplyr has run out of things to feed it and you get the equivalent of:

> row_number() Error in rank(x, ties.method = "first") :    argument "x" is missing, with no default 

That's your error right there.

Anyway, that's not the way to select rows.

You simply need to subscript df[2:7,], or if you insist on pipes everywhere:

> df %>% "["(.,2:7,)   id        var 2  2 0.52352994 3  3 0.02994982 4  4 0.90074801 5  5 0.68935493 6  6 0.57012344 7  7 0.01489950 
like image 28
Spacedman Avatar answered Oct 22 '22 07:10

Spacedman