how to Load CSV Data in scikit and using it for Naive Bayes Classification

Tags:

Trying to load custom data to perform NB Classification in Scikit. Need help in loading the sample data into Scikit and then perform NB. How to load categorical values for target.

Use the same data for Train and Test or use a complete set just for test.

Sl No,Member ID,Member Name,Location,DOB,Gender,Marital Status,Children,Ethnicity,Insurance Plan ID,Annual Income ($),Twitter User ID
1,70000001,Fly Dorami,New York,39786,M,Single,,Asian,2002,0,548900028
2,70000002,Bennie Ariana,Pennsylvania,6/24/1940,F,Single,,Caucasian,2002,66313,
3,70000003,Brad Farley,Pennsylvania,12001,F,Married,4,African American,2002,98444,
4,70000004,Daggoo Cece,Indiana,14032,F,Married,2,Hispanic,2001,41896,113481472.

757

asked Aug 23 '13 06:08

satish john

1 Answers

The following should get you started you will need pandas and numpy. You can load your .csv into a data frame and use that to input into the model. You all so need to define targets (0 for negatives and 1 for positives, assuming binary classification) depending on what you are trying to separate.

from sklearn.naive_bayes import GaussianNB
import pandas as pd
import numpy as np

# create data frame containing your data, each column can be accessed # by df['column   name']
df = pd.read_csv('/your/path/yourFile.csv')

target_names = np.array(['Positives','Negatives'])

# add columns to your data frame
df['is_train'] = np.random.uniform(0, 1, len(df)) <= 0.75
df['Type'] = pd.Factor(targets, target_names)
df['Targets'] = targets

# define training and test sets
train = df[df['is_train']==True]
test = df[df['is_train']==False]

trainTargets = np.array(train['Targets']).astype(int)
testTargets = np.array(test['Targets']).astype(int)

# columns you want to model
features = df.columns[0:7]

# call Gaussian Naive Bayesian class with default parameters
gnb = GaussianNB()

# train model
y_gnb = gnb.fit(train[features], trainTargets).predict(train[features])

answered Nov 14 '22 21:11

rlmlr

Related questions
                            
                                Unable to return a value from a function
                            
                                Getting adjective from an adverb in nltk or other NLP library
                            
                                Using MySQLdb module with Pypy compiler
                            
                                Output Multi-line strings with Python Flask or other framework
                            
                                Python - parse IPv4 addresses from string (even when censored)
                            
                                Getting a grid of a matrix via logical indexing in Numpy
                            
                                How to find if excel cell is a date
                            
                                os.path.join() and os.path.normpath() both add double backwards slash on windows [duplicate]
                            
                                Read every second line and print to new file
                            
                                Confused about running Scrapy from within a Python script
                            
                                Recompile all Python files in directory
                            
                                add a new key value pair to existing key value pair object in python
                            
                                Combine two array's data using inner join
                            
                                How do I access the data sent to my server using BaseHTTPRequestHandler? [duplicate]
                            
                                Understanding Python HTTP streaming
                            
                                Python: How to use a generator to avoid sql memory issue
                            
                                How to deploy zip files (or other binaries) trough cgi in Python?
                            
                                Reading a .DAT file in python?
                            
                                Change dataframe index values while keeping other column data same
                            
                                Getting the class labels from an sklearn.svm.LinearSVC object

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

how to Load CSV Data in scikit and using it for Naive Bayes Classification

Tags:

python

csv

scikit-learn

scikits

satish john

People also ask

1 Answers

rlmlr

Recent Activity

Donate For Us