Handling missing attributes in Naive Bayes classifier

Tags:

I am writing a Naive Bayes classifier for performing indoor room localization from WiFi signal strength. So far it is working well, but I have some questions about missing features. This occurs frequently because I use WiFi signals, and WiFi access points are simply not available everywhere.

Question 1: Suppose I have two classes, Apple and Banana, and I want to classify test instance T1 as below.

enter image description here

I fully understand how the Naive Bayes classifier works. Below is the formula I am using from Wikipedia's article on the classifier. I am using uniform prior probabilities P(C=c), so I am omitting it in my implementation.

enter image description here

Now, when I compute the right-hand side of the equation and loop over all the class-conditional feature probabilities, which set of features do I use? Test instance T1 uses features 1, 3, and 4, but the two classes do not have all these features. So when I perform my loop to compute the probability product, I see several choices on what I'm looping over:

Loop over the union of all features from training, namely features 1, 2, 3, 4. Since the test instance T1 does not have feature 2, then use an artificial tiny probability.
Loop over only features of the test instance, namely 1, 3, and 4.
Loop over the features available for each class. To compute class-conditional probability for 'Apple', I would use features 1, 2, and 3, and for 'Banana', I would use 2, 3, and 4.

Which of the above should I use?

Question 2: Let's say I want to classify test instance T2, where T2 has a feature not found in either class. I am using log probabilities to help eliminate underflow, but I am not sure of the details of the loop. I am doing something like this (in Java-like pseudocode):

Double bestLogProbability = -100000;
ClassLabel bestClassLabel = null;

for (ClassLabel classLabel : allClassLabels)
{
    Double logProbabilitySum = 0.0;

    for (Feature feature : allFeatures)
    {
        Double logProbability = getLogProbability(classLabel, feature);

        if (logProbability != null)
        {
            logProbabilitySum += logProbability;
        }
    }

    if (bestLogProbability < logProbability)
    {
        bestLogProbability = logProbabilitySum;
        bestClassLabel = classLabel;
    }
}

The problem is that if none of the classes have the test instance's features (feature 5 in the example), then logProbabilitySum will remain 0.0, resulting in a bestLogProbability of 0.0, or linear probability of 1.0, which is clearly wrong. What's a better way to handle this?

356

asked Nov 19 '12 18:11

stackoverflowuser2010

1 Answers

For the Naive Bayes classifier, the right hand side of your equation should iterate over all attributes. If you have attributes that are sparsely populated, the usual way to handle that is by using an m-estimate of the probability which uses an equivalent sample size to calculate your probabilities. This will prevent the class-conditional probabilities from becoming zero when your training data have a missing attribute value. Do a web search for the two bold terms above and you will find numerous descriptions of the m-estimate formula. A good reference text that describes this is Machine Learning by Tom Mitchell. The basic formula is

P_i = (n_i + m*p_i) / (n + m)

n_i is the number of training instances where the attribute has value f_i, n is the number of training instances (with the current classification), m is the equivalent sample size, and p_i is the prior probability for f_i. If you set m=0, this just reverts to the standard probability values (which may be zero, for missing attribute values). As m becomes very large, P_i approaches p_i (i.e., the probability is dominated by the prior probability). If you don't have a prior probability to use, just make it 1/k, where k is the number of attribute values.

If you use this approach, then for your instance T2, which has no attributes present in the training data, the result will be whichever class occurs most often in the training data. This makes sense since there is no relevant information in the training data by which you could make a better decision.

103

answered Sep 22 '22 00:09

bogatron

Related questions
                            
                                How to get distinct loggers in log4j?
                            
                                How do I use ProGuard?
                            
                                How to open a new eclipse editor with a specific cursor offset position
                            
                                Can Java program establish JDBC Connection via Proxy Server
                            
                                Find weather using Java
                            
                                Can i avoid the cipher reinitialization per encrypt/decrypt call when using random salts per encryption?
                            
                                How do you zoom in on a JavaFX 2 Canvas node?
                            
                                Java : Singleton class instances in a Web based Application
                            
                                could not instantiate RegionFactory
                            
                                Configuring Shiro to allow anonymous access to resource folders (JS, CSS etc)
                            
                                How can I POST using Java and include parameters and a raw request body?
                            
                                How do I clear a JTree model?(Removing all nodes)
                            
                                Is there a proper algorithm for detecting the background color of a figure?
                            
                                Will Java have a way for non-library developers to use extension methods?
                            
                                Mathematical Set Validation with regular-expression
                            
                                What is a thread-safe ByteArrayOutputStream?
                            
                                JFreeChart - change SeriesStroke of chart lines from solid to dashed in one line
                            
                                Android: Using JNI from NativeActivity
                            
                                How is reading an InputStream object from a local file different than from the network (via Amazon S3)?
                            
                                java.lang.IllegalArgumentException: argument type mismatch while using Reflection

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Handling missing attributes in Naive Bayes classifier

Tags:

java

machine-learning

classification

data-mining

bayesian

stackoverflowuser2010

People also ask

1 Answers

bogatron

Recent Activity

Donate For Us