Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SMOTE function 'subscript out of bond'

I'm trying to implement a logistic regression as follows:

However I can't get good predictions because my class output 1 is under-represented in my data. Therefore I'm trying to apply SMOTE algorithm to my trainset in order to get better results. However I get the message error:

Error in T[i, ] : subscript out of bounds

There is my code:

set.seed(157)
split <- createDataPartition(df_statique$Y, p = .50,list = FALSE,times = 1)
trainSplit <- df_statique[ split,]
testSplit <- df_statique[-split,]
trainSplit <- SMOTE(Y ~ insolvency + efficiency +  DebtToAssetsRatio  + taille  + CashAssetRatio    + current + netWorth + REA, trainSplit, perc.over = 300, perc.under=100)

There is a part of my dataframe df_statique:

index  countryIsoCode insolvency efficiency CashAssetRatio DebtToAssetsRatio netWorth     REA      taille Y
41807             IT    0.00360  0.5193711      0.8686575        0.49446355   4387182 1.657145e-03      2 1
41808             IT    0.00050  1.5269309      1.6295765        0.36543122  30916838 6.601092e-03      3 0
41809             IT    0.00050  2.2635592      1.3427063        0.15809120   2200087 1.218576e-03      1 0
41810             IT    0.00280  1.3989753      0.9345793        0.69642554   2940473 3.852093e-04      2 0
41811             IT    0.00140  2.1440221      3.5781748        0.07951644  28418622 8.845920e-04      2 0
41812             IT    0.00040  1.0068491      1.7238305        0.47561418  22486133 2.703242e-04      2 0
41813             IT    0.00130  1.5569114      1.4459704        0.57632716   9769040 9.741611e-04      2 0
41814             IT    0.00510  5.0143711      0.1035034        0.71267895   3610152 2.391447e-03      2 0
41815             IT    0.00090  3.3280521      0.5160867        0.34998732 218965703 2.550272e-04      3 0
41816             IT    0.00040  1.7217051      2.2758391        0.29638050  29868519 1.136387e-04      3 0
41817             IT    0.00360  1.7261580      0.8490392        0.41231551 106020226 2.304773e-06      3 0
41818             IT    0.00040  1.3600893      1.6298656        0.57789518  55408765 4.841743e-04      3 1
41819             IT    0.00510  5.5565821      0.1376145        0.19679467   9491245 1.398124e-03      2 0
41820             IT    0.00131  3.8312347      1.1365521        0.73639696   8921497 4.701300e-06      3 0
41821             IT    0.00400  1.8218620      0.9113375        0.62646234  24134486 9.435248e-04      3 0
41822             IT    0.00100  1.8215702      1.0690901        0.82764828    777547 6.335832e-03      2 0
41823             IT    0.00090  1.8153513      0.9320536        0.80258849   2437903 6.035954e-04      2 0
41824             IT    0.00050  2.1300765      1.7388457        0.31394248  27009000 3.507500e-04      3 0
41825             IT    0.00100  1.8697385      1.4438289        0.56198890     35917 5.765082e-03      1 0
41826             IT    0.00230  6.5298138      1.1726536        0.56654516   2675415 1.038839e-02      2 0
41827             IT    0.00220  9.8201528      0.4794298        0.63618554    488924 1.336866e-05      2 0

Finally, my output Y is a dummy indicating a default or not at horizon 1 year

like image 725
T. Ciffréo Avatar asked Dec 06 '25 11:12

T. Ciffréo


1 Answers

This error occurs when the target variable you use for SMOTE function is of INT data type. SMOTE can only work with factor target variable.

like image 166
Kruthika Avatar answered Dec 08 '25 01:12

Kruthika



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!