I have the following function to return 9 data frames:
split_data <- function(dataset, train_perc = 0.6, cv_perc = 0.2, test_perc = 0.2)
{
m <- nrow(dataset)
n <- ncol(dataset)
#Sort the data randomly
data_perm <- dataset[sample(m),]
#Split data into training, CV, and test sets
train <- data_perm[1:round(train_perc*m),]
cv <- data_perm[(round(train_perc*m)+1):round((train_perc+cv_perc)*m),]
test <- data_perm[(round((train_perc+cv_perc)*m)+1):round((train_perc+cv_perc+test_perc)*m),]
#Split sets into X and Y
X_train <- train[c(1:(n-1))]
Y_train <- train[c(n)]
X_cv <- cv[c(1:(n-1))]
Y_cv <- cv[c(n)]
X_test <- test[c(1:(n-1))]
Y_test <- test[c(n)]
}
My code runs fine, but no data frames are created. Is there a way to do this? Thanks
yeah? Well, you can't. You get an error that R cannot return multiple values.
1 Answer. In R programming, functions do not return multiple values, however, you can create a list that contains multiple objects that you want a function to return.
To join two data frames (datasets) vertically, use the rbind function. The two data frames must have the same variables, but they do not have to be in the same order. If data frameA has variables that data frameB does not, then either: Delete the extra variables in data frameA or.
Creating a list of Dataframes. To create a list of Dataframes we use the list() function in R and then pass each of the data frame you have created as arguments to the function.
This will store the nine data.frames
in a list
split_data <- function(dataset, train_perc = 0.6, cv_perc = 0.2, test_perc = 0.2) {
m <- nrow(dataset)
n <- ncol(dataset)
#Sort the data randomly
data_perm <- dataset[sample(m),]
# list to store all data.frames
out <- list()
#Split data into training, CV, and test sets
out$train <- data_perm[1:round(train_perc*m),]
out$cv <- data_perm[(round(train_perc*m)+1):round((train_perc+cv_perc)*m),]
out$test <- data_perm[(round((train_perc+cv_perc)*m)+1):round((train_perc+cv_perc+test_perc)*m),]
#Split sets into X and Y
out$X_train <- train[c(1:(n-1))]
out$Y_train <- train[c(n)]
out$X_cv <- cv[c(1:(n-1))]
out$Y_cv <- cv[c(n)]
out$X_test <- test[c(1:(n-1))]
out$Y_test <- test[c(n)]
return(out)
}
If you want dataframes to be created in the workspace at the end, this is what you'll need to do:-
1) Create empty variable (which may equal out to NULL i.e. Y_test = NULL) in your R console.
2) Assign "<<-" operator to the same variables created in Step 1 inside your function i.e.
X_train <<- train[c(1:(n-1))]
Y_train <<- train[c(n)]
X_cv <<- cv[c(1:(n-1))]
Y_cv <<- cv[c(n)]
X_test <<- test[c(1:(n-1))]
Y_test <<- test[c(n)]
This shall make you access the newly created data from your workspace.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With