Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does a dataset need to be a normal distribution for every parameter?

Sorry, I have just started machine learning and am not by any means an expert in it. So, most likely this question will sound ignorant which I am afraid that I cannot avoid. Also, I searched to the best of my ability and was incapable of finding similar questions or answers that may address my question.

I learned that a model cannot learn if it was not from a dataset that has a normal distribution. Also, the only way I use to find out that a data set is normally distributed is the graphical method described here for each parameter. Which may be unadvisable, and if so I am always subject to change, so please correct me if that is the case.

To get to my question, if I see a normal distribution for certain parameters yet not for a few others, does that mean the dataset is flawed? Or does it mean that I should not use those parameters for the model?

Thanks in advance, and sorry if there are any fundamental errors in my understanding of the concepts.

like image 607
Isamu Isozaki Avatar asked Nov 09 '22 03:11

Isamu Isozaki


1 Answers

As cel said, every model has its own assumptions and limitations. While there might be a model that can only learn on completely normally distributed data - there are plenty of models which don't, such as SVMs or Random Forests.

In practice if you know that your data does not conform to the assumptions of your model you could consider using a different model or to manipulate your data to fit your assumption. The latter option is something that you should consider carefully to make sure your manipulation does not render your model useless when used in real-life scenarios.

like image 194
ginge Avatar answered Dec 06 '22 16:12

ginge