I have a dataset in csv
format (without headers) where I want to split it into two parts: (1) The actual dataset without the last column, (2) the last column (class label). My dataset has 100K rows and 65 features ( where the last column, column 65, is the class label that I want to separate). I wrote the following:
dataset_path = 'dataset.csv'
dataset = np.genfromtxt(dataset_path, delimiter=',')
class_label = dataset[:-1]
dataset.drop(class_label, axis=1, inplace=True)
print dataset.shape
print class_label
This is in fact wrong. I am not able to achieve what I want. Any help is appreciated.
assuming that your dataset is without header
class_label = dataset[:, -1] # for last column
dataset = dataset[:, :-1] # for all but last column
In case you are interested in using numpy arrays, you can read your data in the csv file into a numpy array:
from numpy import genfromtxt
my_data = genfromtxt('E:\Book1.csv', delimiter=',', dtype = 'str', skip_header=1, unpack=True)
each item in my_data
will be a list of each column in your csv file.
Now you can remove the last column by:
my_data_without_last_column = my_data[:-1].copy()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With