Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using dplyr::filter, how can the output be limited to just first 500 rows?

Tags:

r

dplyr

dplyr is a great and fast library.

Using the %>% operator enables powerful manipulation.

In my first step, I need to limit the output to only 500 rows max (for display purposes).

How can I do that?

par<-filter(pc,Child_Concept_GID==as.character(mcode)) %>% select(Parent_Concept_GID)

what I need is something like

filter(pc,CONDITION,rows=500)

Is there direct way or a nice workaround without making the first step a separate step (outside the %>% "stream")

like image 325
userJT Avatar asked Jun 20 '14 16:06

userJT


People also ask

How do you subset rows in R dplyr?

In order to Filter or subset rows in R we will be using Dplyr package. Dplyr package in R is provided with filter() function which subsets the rows with multiple conditions on different criteria. We will be using mtcars data to depict the example of filtering or subsetting. Filter or subset the rows in R using dplyr.

What dplyr function do you use to pick observations by their values?

6.4 dplyr basicsfilter() : pick observations by their values. select() : pick variables by their names. mutate() : create new variables with functions of existing variables. summarise() : collapse many values down to a single summary.

Which function from the dplyr package is used to create new columns in a Dataframe?

You can use the mutate() function from the dplyr package to add one or more columns to a data frame in R.

How do I use dplyr in R?

Describe what the dplyr package in R is used for. Apply common dplyr functions to manipulate data in R. Employ the 'pipe' operator to link together a sequence of functions. Employ the 'mutate' function to apply other chosen functions to existing columns and create new columns of data.


2 Answers

There are a couple of ways to do this. Assuming you are pipe-lining your data (using %>%)

  • top_n(tn) works with grouped data. It will not return tn rows, if the data is sorted with arrange()
  • head(500) takes the first 500 rows (can be used after arrange(), for example)
  • sample_n(size=500) can be used to select 500 arbitrary rows

If you are looking for the R equivalent to SQL's LIMIT, use head().

like image 65
matthew.peters Avatar answered Oct 21 '22 13:10

matthew.peters


I think you're actually looking for slice() here.

filter(pc, condition) %>% slice(1:500) 

This does not rank the results. It merely pulls a slice, by position. In this case positions 1 through 500.

If this is coming from a relational db, head is a better option.

like image 38
Brandon Bertelsen Avatar answered Oct 21 '22 12:10

Brandon Bertelsen