Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Separate last column from the actual dataset using numpy

I have a dataset in csv format (without headers) where I want to split it into two parts: (1) The actual dataset without the last column, (2) the last column (class label). My dataset has 100K rows and 65 features ( where the last column, column 65, is the class label that I want to separate). I wrote the following:

dataset_path = 'dataset.csv'

dataset = np.genfromtxt(dataset_path, delimiter=',')
class_label = dataset[:-1]
dataset.drop(class_label, axis=1, inplace=True)

print dataset.shape
print class_label

This is in fact wrong. I am not able to achieve what I want. Any help is appreciated.

like image 455
Medo Avatar asked Nov 28 '17 23:11

Medo


2 Answers

assuming that your dataset is without header

class_label = dataset[:, -1] # for last column
dataset = dataset[:, :-1] # for all but last column
like image 68
ahed87 Avatar answered Oct 13 '22 10:10

ahed87


In case you are interested in using numpy arrays, you can read your data in the csv file into a numpy array:

 from numpy import genfromtxt
 my_data = genfromtxt('E:\Book1.csv', delimiter=',', dtype = 'str',  skip_header=1, unpack=True)

each item in my_data will be a list of each column in your csv file. Now you can remove the last column by:

 my_data_without_last_column = my_data[:-1].copy()
like image 44
Behzad Jamali Avatar answered Oct 13 '22 10:10

Behzad Jamali