I'm trying to implement a logistic regression as follows:
However I can't get good predictions because my class output 1 is under-represented in my data. Therefore I'm trying to apply SMOTE algorithm to my trainset in order to get better results. However I get the message error:
Error in T[i, ] : subscript out of bounds
There is my code:
set.seed(157)
split <- createDataPartition(df_statique$Y, p = .50,list = FALSE,times = 1)
trainSplit <- df_statique[ split,]
testSplit <- df_statique[-split,]
trainSplit <- SMOTE(Y ~ insolvency + efficiency + DebtToAssetsRatio + taille + CashAssetRatio + current + netWorth + REA, trainSplit, perc.over = 300, perc.under=100)
There is a part of my dataframe df_statique:
index countryIsoCode insolvency efficiency CashAssetRatio DebtToAssetsRatio netWorth REA taille Y
41807 IT 0.00360 0.5193711 0.8686575 0.49446355 4387182 1.657145e-03 2 1
41808 IT 0.00050 1.5269309 1.6295765 0.36543122 30916838 6.601092e-03 3 0
41809 IT 0.00050 2.2635592 1.3427063 0.15809120 2200087 1.218576e-03 1 0
41810 IT 0.00280 1.3989753 0.9345793 0.69642554 2940473 3.852093e-04 2 0
41811 IT 0.00140 2.1440221 3.5781748 0.07951644 28418622 8.845920e-04 2 0
41812 IT 0.00040 1.0068491 1.7238305 0.47561418 22486133 2.703242e-04 2 0
41813 IT 0.00130 1.5569114 1.4459704 0.57632716 9769040 9.741611e-04 2 0
41814 IT 0.00510 5.0143711 0.1035034 0.71267895 3610152 2.391447e-03 2 0
41815 IT 0.00090 3.3280521 0.5160867 0.34998732 218965703 2.550272e-04 3 0
41816 IT 0.00040 1.7217051 2.2758391 0.29638050 29868519 1.136387e-04 3 0
41817 IT 0.00360 1.7261580 0.8490392 0.41231551 106020226 2.304773e-06 3 0
41818 IT 0.00040 1.3600893 1.6298656 0.57789518 55408765 4.841743e-04 3 1
41819 IT 0.00510 5.5565821 0.1376145 0.19679467 9491245 1.398124e-03 2 0
41820 IT 0.00131 3.8312347 1.1365521 0.73639696 8921497 4.701300e-06 3 0
41821 IT 0.00400 1.8218620 0.9113375 0.62646234 24134486 9.435248e-04 3 0
41822 IT 0.00100 1.8215702 1.0690901 0.82764828 777547 6.335832e-03 2 0
41823 IT 0.00090 1.8153513 0.9320536 0.80258849 2437903 6.035954e-04 2 0
41824 IT 0.00050 2.1300765 1.7388457 0.31394248 27009000 3.507500e-04 3 0
41825 IT 0.00100 1.8697385 1.4438289 0.56198890 35917 5.765082e-03 1 0
41826 IT 0.00230 6.5298138 1.1726536 0.56654516 2675415 1.038839e-02 2 0
41827 IT 0.00220 9.8201528 0.4794298 0.63618554 488924 1.336866e-05 2 0
Finally, my output Y is a dummy indicating a default or not at horizon 1 year
This error occurs when the target variable you use for SMOTE function is of INT data type. SMOTE can only work with factor target variable.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With