UPDATE: dplyr has been updated since this question was asked and now performs as the OP wanted
I´m trying to get the second to the seventh line in a data.frame
using dplyr
.
I´m doing this:
require(dplyr) df <- data.frame(id = 1:10, var = runif(10)) df <- df %>% filter(row_number() <= 7, row_number() >= 2)
But this throws an error.
Error in rank(x, ties.method = "first") : argument "x" is missing, with no default
I know i could easily make:
df <- df %>% mutate(rn = row_number()) %>% filter(rn <= 7, rn >= 2)
But I would like to understand why my first try is not working.
To get number of rows in R Data Frame, call the nrow() function and pass the data frame as argument to this function. nrow() is a function in R base package.
How to Use “not in” operator in Filter, To filter for rows in a data frame that is not in a list of values, use the following basic syntax in dplyr. df %>% filter(! col_name %in% c('value1', 'value2', 'value3', ...)) df %>% filter(!
ROW_NUMBER is an analytic function. It assigns a unique number to each row to which it is applied (either each row in the partition or each row returned by the query), in the ordered sequence of rows specified in the order_by_clause , beginning with 1.
Actually dplyr's slice
function is made for this kind of subsetting:
df %>% slice(2:7)
(I'm a little late to the party but thought I'd add this for future readers)
The row_number()
function does not simply return the row number of each element and so can't be used like you want:
• ‘row_number’: equivalent to ‘rank(ties.method = "first")’
You're not actually saying what you want the row_number
of. In your case:
df %>% filter(row_number(id) <= 7, row_number(id) >= 2)
works because id
is sorted and so row_number(id)
is 1:10
. I don't know what row_number()
evaluates to in this context, but when called a second time dplyr
has run out of things to feed it and you get the equivalent of:
> row_number() Error in rank(x, ties.method = "first") : argument "x" is missing, with no default
That's your error right there.
Anyway, that's not the way to select rows.
You simply need to subscript df[2:7,]
, or if you insist on pipes everywhere:
> df %>% "["(.,2:7,) id var 2 2 0.52352994 3 3 0.02994982 4 4 0.90074801 5 5 0.68935493 6 6 0.57012344 7 7 0.01489950
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With