stratified splitting the data

Tags:

I have a large data set and like to fit different logistic regression for each City, one of the column in my data. The following 70/30 split works without considering City group.

indexes <- sample(1:nrow(data), size = 0.7*nrow(data))

train <- data[indexes,]
test <- data[-indexes,]

But this does not guarantee the 70/30 split for each city.

lets say that I have City A and City B, where City A has 100 rows, and City B has 900 rows, totaling 1000 rows. Splitting the data with above code will give me 700 rows for train and 300 for test data, but it does not guarantee that i will have 70 rows for City A, and 630 rows for City B in the train data. How do i do that?

Once i have the training data split-ed to 70/30 fashion for each city,i will run logistic regression for each city ( I know how to do this once i have the train data)

929

asked Dec 25 '13 21:12

user35577

1 Answers

Try createDataPartition from caret package. Its document states: By default, createDataPartition does a stratified random split of the data.

library(caret)
train.index <- createDataPartition(Data$Class, p = .7, list = FALSE)
train <- Data[ train.index,]
test  <- Data[-train.index,]

it can also be used for stratified K-fold like:

ctrl <- trainControl(method = "repeatedcv",
                     repeats = 3,
                     ...)
# when calling train, pass this train control
train(...,
      trControl = ctrl,
      ...)

check out caret document for more details

103

answered Oct 22 '22 04:10

muon

Related questions
                            
                                Closing active connections using RMySQL
                            
                                Function to split a matrix into sub-matrices in R
                            
                                Faster ways to calculate frequencies and cast from long to wide
                            
                                Assignment in R language
                            
                                How to change the background color of the Shiny Dashboard Body
                            
                                add column values based on other columns in data frame using for and if
                            
                                Cannot build R package "png" Fedora 20
                            
                                Remove all text between two brackets
                            
                                sf: Write Lat/Long from geometry into separate column and keep ID column
                            
                                RStudio Shiny renderDataTable font size
                            
                                Is it possible to skip NA values in "+" operator?
                            
                                Efficient filtering through multiple columns by group
                            
                                R Error - cannot change value of locked binding for 'df'
                            
                                Handling NA values in apply and unique
                            
                                Python: how to do basic data manipulation like in R?
                            
                                Convert a matrix with dimnames into a long format data.frame
                            
                                Constructing quines (self-reproducing functions)
                            
                                How to compute correlations between all columns in R and detect highly correlated variables
                            
                                Substitute the ^ (power) symbol with C's pow syntax in mathematical expression
                            
                                Converting latitude and longitude points to UTM

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

stratified splitting the data

Tags:

split

r

logistic-regression

user35577

People also ask

1 Answers

muon

Recent Activity

Donate For Us