Separate last column from the actual dataset using numpy

Question

I have a dataset in csv format (without headers) where I want to split it into two parts: (1) The actual dataset without the last column, (2) the last column (class label). My dataset has 100K rows and 65 features ( where the last column, column 65, is the class label that I want to separate). I wrote the following:

dataset_path = 'dataset.csv'

dataset = np.genfromtxt(dataset_path, delimiter=',')
class_label = dataset[:-1]
dataset.drop(class_label, axis=1, inplace=True)

print dataset.shape
print class_label

This is in fact wrong. I am not able to achieve what I want. Any help is appreciated.

ahed87 · Accepted Answer

assuming that your dataset is without header

class_label = dataset[:, -1] # for last column
dataset = dataset[:, :-1] # for all but last column

Behzad Jamali · Answer

In case you are interested in using numpy arrays, you can read your data in the csv file into a numpy array:

 from numpy import genfromtxt
 my_data = genfromtxt('E:\Book1.csv', delimiter=',', dtype = 'str',  skip_header=1, unpack=True)

each item in my_data will be a list of each column in your csv file. Now you can remove the last column by:

 my_data_without_last_column = my_data[:-1].copy()

Separate last column from the actual dataset using numpy

Tags:

python

numpy

python-2.7

Medo

2 Answers

ahed87

Behzad Jamali

Recent Activity

Donate For Us

Separate last column from the actual dataset using numpy

Tags:

python

numpy

python-2.7

Medo

2 Answers

ahed87

Behzad Jamali

Related questions

Recent Activity

Donate For Us