I'm trying to make a k-fold CV for several classification methods/hiperparameters using the data available at http://archive.ics.uci.edu/ml/machine-learning-databases/undocumented/connectionist-bench/sonar/sonar.all-data.
This set is made of 208 rows, each with 60 attributes. I'm reading it into a data.frame using the read.table function.
The next step is to split my data into k folds, let's say k = 5. My first attempt was to use
test <- createFolds(t, k=5)
I had two issues with this. The first one is that the lengths of the folds are not next to each other:
Length Class Mode
Fold1 29 -none- numeric <br />
Fold2 14 -none- numeric <br />
Fold3 7 -none- numeric <br />
Fold4 5 -none- numeric <br />
Fold5 5 -none- numeric
The other one is that this apparently splitted my data according to the attributes indexes, but I want to split the data itself. I thought that by transposing my data.frame, using:
test <- t(myDataNumericValues)
But when I call the createFolds function, it gives me something like this:
Length Class Mode
Fold1 2496 -none- numeric <br />
Fold2 2496 -none- numeric <br />
Fold3 2495 -none- numeric <br />
Fold4 2496 -none- numeric <br />
Fold5 2497 -none- numeric
The length issue was solved, but it's still not splitting my 208 data accordingly.
What I can do? Is the caret package maybe not the most appropriate?
Please read ?createFolds
to understand what the function does. It creates the indices that define which data are held out the separate folds (see the options to return the converse):
> library(caret)
> library(mlbench)
> data(Sonar)
>
> folds <- createFolds(Sonar$Class)
> str(folds)
List of 10
$ Fold01: int [1:21] 25 39 58 63 69 73 80 85 90 95 ...
$ Fold02: int [1:21] 19 21 42 48 52 66 72 81 88 89 ...
$ Fold03: int [1:21] 4 5 17 34 35 47 54 68 86 100 ...
$ Fold04: int [1:21] 2 6 22 29 32 40 60 65 67 92 ...
$ Fold05: int [1:20] 3 14 36 41 45 75 78 84 94 104 ...
$ Fold06: int [1:21] 10 11 24 33 43 46 50 55 56 97 ...
$ Fold07: int [1:21] 1 7 8 20 23 28 31 44 71 76 ...
$ Fold08: int [1:20] 16 18 26 27 38 57 77 79 91 99 ...
$ Fold09: int [1:21] 13 15 30 37 49 53 74 83 93 96 ...
$ Fold10: int [1:21] 9 12 51 59 61 62 64 70 82 87 ...
To use these to split the data:
> split_up <- lapply(folds, function(ind, dat) dat[ind,], dat = Sonar)
> dim(Sonar)
[1] 208 61
> unlist(lapply(split_up, nrow))
Fold01 Fold02 Fold03 Fold04 Fold05 Fold06 Fold07 Fold08 Fold09 Fold10
21 21 21 21 20 21 21 20 21 21
The function train
is used in this package to do the actual modeling (you don't usually need to do the splitting yourself. See this page).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With