How to use both binary and continuous features in the k-Nearest-Neighbor algorithm?

Question

My feature vector has both continuous (or widely ranging) and binary components. If I simply use Euclidean distance, the continuous components will have a much greater impact:

Representing symmetric vs. asymmetric as 0 and 1 and some less important ratio ranging from 0 to 100, changing from symmetric to asymmetric has a tiny distance impact compared to changing the ratio by 25.

I can add more weight to the symmetry (by making it 0 or 100 for example), but is there a better way to do this?

NPE · Accepted Answer

You could try using the normalized Euclidean distance, described, for example, at the end of the first section here.

It simply scales every feature (continuous or discrete) by its standard deviation. This is more robust than, say, scaling by the range (max-min) as suggested by another poster.

How to use both binary and continuous features in the k-Nearest-Neighbor algorithm?

Tags:

algorithm

machine-learning

knn

John Hall

1 Answers

NPE

Recent Activity

Donate For Us

How to use both binary and continuous features in the k-Nearest-Neighbor algorithm?

Tags:

algorithm

machine-learning

knn

John Hall

1 Answers

NPE

Related questions

Recent Activity

Donate For Us