Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to Load CSV Data in scikit and using it for Naive Bayes Classification

Trying to load custom data to perform NB Classification in Scikit. Need help in loading the sample data into Scikit and then perform NB. How to load categorical values for target.

Use the same data for Train and Test or use a complete set just for test.

Sl No,Member ID,Member Name,Location,DOB,Gender,Marital Status,Children,Ethnicity,Insurance Plan ID,Annual Income ($),Twitter User ID
1,70000001,Fly Dorami,New York,39786,M,Single,,Asian,2002,0,548900028
2,70000002,Bennie Ariana,Pennsylvania,6/24/1940,F,Single,,Caucasian,2002,66313,
3,70000003,Brad Farley,Pennsylvania,12001,F,Married,4,African American,2002,98444,
4,70000004,Daggoo Cece,Indiana,14032,F,Married,2,Hispanic,2001,41896,113481472.
like image 757
satish john Avatar asked Aug 23 '13 06:08

satish john


People also ask

How do I use Naive Bayes in Sklearn?

Step 1: Calculate the prior probability for given class labels. Step 2: Find Likelihood probability with each attribute for each class. Step 3: Put these value in Bayes Formula and calculate posterior probability. Step 4: See which class has a higher probability, given the input belongs to the higher probability class.

Which package is used to import Naive Bayes?

To actually implement the naive Bayes classifier model, we're going to use scikit-learn, and we'll import our GaussianNB from sklearn. naive_bayes .

Which function is used to load a dataset from a CSV file?

As a beginner, you might only know a single way to load data (normally in CSV) which is to read it using pandas. read_csv function.


1 Answers

The following should get you started you will need pandas and numpy. You can load your .csv into a data frame and use that to input into the model. You all so need to define targets (0 for negatives and 1 for positives, assuming binary classification) depending on what you are trying to separate.

from sklearn.naive_bayes import GaussianNB
import pandas as pd
import numpy as np

# create data frame containing your data, each column can be accessed # by df['column   name']
df = pd.read_csv('/your/path/yourFile.csv')

target_names = np.array(['Positives','Negatives'])

# add columns to your data frame
df['is_train'] = np.random.uniform(0, 1, len(df)) <= 0.75
df['Type'] = pd.Factor(targets, target_names)
df['Targets'] = targets

# define training and test sets
train = df[df['is_train']==True]
test = df[df['is_train']==False]

trainTargets = np.array(train['Targets']).astype(int)
testTargets = np.array(test['Targets']).astype(int)

# columns you want to model
features = df.columns[0:7]

# call Gaussian Naive Bayesian class with default parameters
gnb = GaussianNB()

# train model
y_gnb = gnb.fit(train[features], trainTargets).predict(train[features])
like image 79
rlmlr Avatar answered Nov 14 '22 21:11

rlmlr