R split data into 2 parts randomly

Tags:

I am trying to split my data frame into 2 parts randomly. For example, I'd like to get a random 70% of the data into one data frame and the other 30% into other data frame. Is there a fast way to do this? The number of rows in the original data frame is over 800000. I've tried with a for loop, selecting a random number from the number of rows, and then binding that row to the first (70%) data frame using rbind() and deleting it from the original data frame to get the other (30%) data frame. But this is extremely slow. Is there a relatively fast way I could do this?

582

asked Jul 01 '15 05:07

gregorp

2 Answers

Try

n <- 100
data <- data.frame(x=runif(n), y=rnorm(n))
ind <- sample(c(TRUE, FALSE), n, replace=TRUE, prob=c(0.7, 0.3))
data1 <- data[ind, ]
data2 <- data[!ind, ]

169

answered Sep 21 '22 11:09

ExperimenteR

I am building on the answer by ExperimenteR, which appears robust. One issue however is that the sample function is a bit weird in that it uses probabilities, which are not completely deterministic. Take this for example:

>sample(c(TRUE, FALSE), n, replace=TRUE, prob=c(0.7, 0.3))

You would expect that the number of TRUE and FALSE values to be exactly 70 and 30, respectively. Oftentimes, this is not the case:

>table(sample(c(TRUE, FALSE), n, replace=TRUE, prob=c(0.7, 0.3)))

 FALSE  TRUE 
    34    66

Which is alright if you're not looking to be super precise. But if you would like exactly 70% and 30%, then do this instead:

v <- as.vector(c(rep(TRUE,70),rep(FALSE,30))) #create 70 TRUE, 30 FALSE
ind <- sample(v) #Sample them randomly. 
data1 <- data[ind, ] 
data2 <- data[!ind, ]

answered Sep 22 '22 11:09

Workhorse

Related questions
                            
                                Remove duplicate observations based on set of rules
                            
                                How to compute the power of a matrix in R [duplicate]
                            
                                Exact axis ticks and labels in R Lattice xyplot
                            
                                ggplot specific thick line
                            
                                g++ errors when trying to compile c++11 with Rcpp
                            
                                How to set line width and color when plotting a shapefile with plot()
                            
                                Removing duplicate words in a string in R
                            
                                R : data.table subsetting based on a integer column
                            
                                Floating point arithmetic and reproducibility
                            
                                R - Keep first observation per group identified by multiple variables (Stata equivalent "bys var1 var2 : keep if _n == 1")
                            
                                How to draw a line with color in shiny application
                            
                                Rolling Sum by Another Variable in R
                            
                                Align multiple ggplot graphs with and without legends [duplicate]
                            
                                How do I determine what packages are dependent on a given package in R?
                            
                                ID chunks of rows by start and end value
                            
                                Create integer sequences defined by 'from' and 'to' vectors
                            
                                Most frequent value (mode) by group [duplicate]
                            
                                Prediction with lme4 on new levels
                            
                                mutate rowSums exclude one column
                            
                                adding empty graphs to facet_wrap in ggplot2

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

R split data into 2 parts randomly

Tags:

random

split

r

gregorp

People also ask

2 Answers

ExperimenteR

Workhorse

Recent Activity

Donate For Us