Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Randomly sample a percentage of rows within a data frame

Tags:

Related to this question.

gender <- c("F", "M", "M", "F", "F", "M", "F", "F")
age    <- c(23, 25, 27, 29, 31, 33, 35, 37)
mydf <- data.frame(gender, age) 

mydf[ sample( which(mydf$gender=='F'), 3 ), ]

Instead of selecting a number of rows (3 in above case), how can I randomly select 20% of rows with "F"? So of the five rows with "F", how do I randomly sample 20% of those rows.

like image 468
ATMathew Avatar asked Feb 22 '13 18:02

ATMathew


People also ask

Which command is used to select 50% rows randomly R?

Sample_frac() function selects a random n percentage of rows from a dataframe or table, the use of this function is similar to the sample_n() function, and this function is widely used in the R programming language.

How do you randomly select data in Python?

Use the numpy. random. choice() function to pick multiple random rows from the multidimensional array.


2 Answers

You can use sample_frac() function in dplyr package.

e.g. If you want to sample 20 % within each group:

mydf %>% sample_frac(.2)

If you want to sample 20 % within each gender group:

mydf %>% group_by(gender) %>% sample_frac(.2)
like image 95
Zhen Liang Avatar answered Oct 21 '22 09:10

Zhen Liang


How about this:

mydf[ sample( which(mydf$gender=='F'), round(0.2*length(which(mydf$gender=='F')))), ]

Where 0.2 is your 20% and length(which(mydf$gender=='F')) is the total number of rows with F

like image 38
Ben Avatar answered Oct 21 '22 07:10

Ben