Creating data partition in R

Tags:

With caret package, when creating data partition 75% training and 25% test, we use:

inTrain<- createDataPartition(y=spam$type,p=0.75, list=FALSE)

Note: dataset is named spam and target variable is named type

My question is, what is the purpose of including y=spam$type argument?

Isn’t the purpose of creating data partitions simply to split the entire data set based on the proportion you require for training vs testing? Why is there the need to include that argument in the code?

771

asked Jul 20 '16 20:07

Aiden

1 Answers

I have assumed that the createDataPartition() in question is referring to the caret package.

If sample$type argument is a factor which is generally the case, the random sampling occurs within each class.

Some more explanation: For example if we were to partition the iris data set in the same proportion as in your question.

Click to copy

attach(iris)
summary(iris)

notice the numbers against each species. Now using the following command:

Click to copy

library(caret)
inTrain <- createDataPartition(y=Species, p=0.75, list=FALSE)

inTrain would take approximately 75% rows from each species, which can be verified by issuing the following command:

Click to copy

summary(iris[inTrain,])

There are 50 species in each category, and 38 (approximately 75%)have been randomly selected for the training data set.

171

answered Sep 28 '22 16:09

Imran Ali

Related questions
                            
                                Multivariate GARCH(1,1) in R
                            
                                Is there a way to simplify functions in R that utilize loops?
                            
                                How to fill colors in some specific area in R?
                            
                                How can I use dplyr/magrittr's pipe inside functions in R?
                            
                                devtools build_vignette can't find functions
                            
                                Create a different color scale for each bar in a ggplot2 stacked bar graph
                            
                                How to make R package recommend a package hosted on GitHub?
                            
                                Aggregate one data frame by time intervals from another data frame
                            
                                sequence of monthly dates making sure it's the same day, or the last day of month in case of invalid
                            
                                How to calculate the mean of the top 10% in R
                            
                                Should I reset Java heap space maximum after use?
                            
                                remove known exact row in huge csv
                            
                                Open a dta file in R
                            
                                Measure distance between the first and last location record per day and animal in R
                            
                                R: Producing frequency table by selecting certain rows
                            
                                Assign a vector to a specific existing row of data table in R
                            
                                Gzip error when reading R data files into julia
                            
                                Lag / lead by group in R and dplyr
                            
                                Major and minor tickmarks with plotly
                            
                                dplyr's filter function: how to return every value (or «cancel» the effect of filter)?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Creating data partition in R

Tags:

r

partitioning

r-caret

data-partitioning

Aiden

People also ask

1 Answers

Imran Ali

Recent Activity

Donate For Us