Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the issue in my calculation of Multivariate Kernel Estimation?

My intention is to find its class through Bayes Classifier Algorithm.

Suppose, the following training data describes heights, weights, and feet-lengths of various sexes

SEX         HEIGHT(feet)    WEIGHT (lbs)    FOOT-SIZE (inches)
male        6               180             12
male        5.92 (5'11")    190             11
male        5.58 (5'7")     170             12
male        5.92 (5'11")    165             10
female      5               100             6
female      5.5 (5'6")      150             8
female      5.42 (5'5")     130             7
female      5.75 (5'9")     150             9
trans       4               200             5
trans       4.10            150             8
trans       5.42            190             7
trans       5.50            150             9

Now, I want to test a person with the following properties (test data) to find his/her sex,

 HEIGHT(feet)   WEIGHT (lbs)    FOOT-SIZE (inches)
 4              150             12

This may also be a multi-row matrix.


Suppose, I am able to isolate only the male portion of the data and arrange it in a matrix,

enter image description here

and, I want to find its Parzen Density Function against the following row matrix that represents same data of another person(male/female/transgender),

enter image description here (dataPoint may have multiple rows.)

so that we can find how closely matches this data with those males.


my attempted solution:

enter image description here


(1) I am unable to calculate the secondPart because of the dimensional mismatch of the matrices. How can I fix this?

(2) Is this approach correct?


MATLAB Code

male = [6.0000  180   12
        5.9200  190   11
        5.5800  170   12
        5.9200  165   10];
dataPoint = [4 150 2]
variance  = var(male);

parzen.m

function [retval] = parzen (male, dataPoint, variance)
    clc
    %male
    %dataPoint
    %variance
    sub = male - dataPoint
    up = sub.^2
    dw = 2 * variance;
    sqr = sqrt(variance*2*pi);
    firstPart = sqr.^(-1);
    e = dw.^(-1)
    secPart = exp((-1)*e*up);
    pdf = firstPart.* secPart;
    retval = mean(pdf);

bayes.m

function retval = bayes (train, test, aprori)
    clc
    classCounts = rows(unique(train(:,1)));

    %pdfmx = ones(rows(test), classCounts);

    %%Parzen density.

    %pdf = parzen(train(:,2:end), test(:,2:end), variance);

    maxScore = 0;
    pdfProduct = 1; 

    for type = 1 : classCounts  
        %if(type == 1)
        clidxTrain = train(:,1) == type;
        %clidxTest = test(:,1) == type;
        trainMatrix = train(clidxTrain,2:end);
        variance = var(trainMatrix);
        pdf = parzen(trainMatrix, test, variance);
        %dictionary{type, 1} = type;
        %dictionary{type, 2} = prod(pdf);
        %pdfProduct = pdfProduct .* pdf;
        %end
    end

    for type=1:classCounts

    end
    retval = 0;  
endfunction
like image 219
user366312 Avatar asked Nov 05 '16 04:11

user366312


1 Answers

First, your example person has a tiny foot!

Second, it seems you are mixing together kernel density estimation and naive Bayes. In a KDE, you estimate a pdf a sum of kernels, one kernel per data point in your sample. So if you wanted to do a KDE of the height of males, you would add together four Gaussians, each one centered at the height of a different male.

In naive Bayes, you assume that the features (height, foot size, etc.) are independent and that each one is normal distributed. You estimate the parameters of a single Gaussian per feature from your training data, then use their product to get the joint probability of a new example belonging to a certain class. The first page that you link explains this fairly well.

In code:

clear

human = [6.0000  180   12
        5.9200  190   11
        5.5800  170   12
        5.9200  165   10];
tiger = [
    2   2000 17
    3   1980 16
    3.5 2100 18
    3   2020 18
    4.1 1800 20
];

dataPoints = [
    4 150 12
    3 2500 20
    ];

sigSqH  = var(human);
muH = mean(human);

sigSqT  = var(tiger);
muT = mean(tiger);

for i = 1:size(dataPoints, 1)
    i
    probHuman = prod( 1./sqrt(2*pi*sigSqH) .* exp( -(dataPoints(i,:) - muH).^2 ./ (2*sigSqH) ) )
    probTiger = prod( 1./sqrt(2*pi*sigSqT) .* exp( -(dataPoints(i,:) - muT).^2 ./ (2*sigSqT)  ) )
end

Comparing the probability of tiger vs. human lets us conclude that dataPoints(1,:) is a person while dataPoints(2,:) is a tiger. You can make this model more complicated by, e.g., adding prior probabilities of being one class or the other, which would then multiply probHuman or probTiger.

like image 89
Tokkot Avatar answered Sep 20 '22 06:09

Tokkot