I am facing the following problem. I have a system able to produce a ranking of some operations according to their anomaly score. To improve the performance I implemented a genetic algorithm to perform a features selection, such that the most anomalous operations appears in the first positions. What I am doing is not exactly feature selection, because I am not using binary variables, rather float variables between 0-1, which sum is equal to 1.
Currently, I have a population of 200 individuals for 50 generations. I am using as the evaluation function the system itself and I evaluate the quality of the solution by using the true positive rate, counting how many anomalous operations appears in the first N positions (where N is the number of anomalous operations). Then as operator the uniform crossover and I change a valueof a cell of the individual for the mutation. Of course, every time I make a check to fix the individual such that the sum is 1. Finally I use elitism to save the best-so-far solution over the time.
I observed that one feature has a very high value, which is often important, but not always, and this causes very low values for the other features. I suspect that my GA is overfitting. Can you help me to find a good stop criteria?
One of the best techniques for reducing overfitting is to increase the size of the training dataset. As discussed in the previous technique, when the size of the training data is small, then the network tends to have greater control over the training data.
A modern approach to reducing generalization error is to use a larger model that may be required to use regularization during training that keeps the weights of the model small. These techniques not only reduce overfitting, but they can also lead to faster optimization of the model and better overall performance.
Overfitting in genetic algorithms and programming is a big issue which is currently under research focus of the GP community, including myself. Most of the research is aimed at genetic programming and evolution of classification/regression models but it might also relate to your problem. There are some papers which might help you (and which I am working with too):
You can find the papers (the first two directly in pdf) by searching for their titles in scholar.google.com.
Basically, what all the papers work with, is the idea of using only a subset of the training data for directing the evolution and (randomly) changing this subset every generation (using the same subset for all individuals in one generation). Interestingly, experiments show that the smaller this subset is, the less overfitting occurs, up to the extreme of using only a single-element subset. The papers work with this idea and extend it with some tweaks (like switching between full dataset and a subset). But as I said in the beginning, all this is aimed at symbolic regression (more or less) and not feature selection.
I personally once tried another approach (again for symbolic regression by genetic programming) - using a subset of training data (e.g. a half) to drive the evolution (i.e. for fitness), but the "best-so-far" solution was determined using results on the remaining training data. The overfitting was much less significant.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With