Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I assign a random seed to the dplyr sample_n function?

This is the "sample_n" from dplyr in R.
https://dplyr.tidyverse.org/reference/sample.html

For reproducibility, I should place a seed so that someone else can get my exact results.

Is there a built-in way to set the seed for "sample_n"? Is this something that I do in the environment and "sample_n" responds to it?

These are not built-into the "sample_n" function.

  • There is the environment "set.seed" function [1]
  • There is a library 'withr' that creates a seed-containing wrapper for code [2]

.

like image 940
EngrStudent Avatar asked Jan 25 '23 19:01

EngrStudent


2 Answers

The dplyr::sample_n documentation tells that :

This is a wrapper around sample.int() to make it easy to select random rows from a table. It currently only works for local tbls.

so behind sample_n, sample.int is called, which means that the standard Random Number Generator is used, and that you can use set.seed for reproducibility.

like image 120
Waldi Avatar answered Jan 29 '23 21:01

Waldi


Does this example help? In it, I am using set.seed and the mtcars dataset.

set.seed(1)
x <- mtcars
sample_n(x, 10)

sample_n(x, 10) #without set.seed()

set.seed(1)
x <- mtcars
sample_n(x, 10)
like image 32
iamericfletcher Avatar answered Jan 29 '23 20:01

iamericfletcher