I have a csv file without headers which I'm importing into python using pandas. The last column is the target class, while the rest of the columns are pixel values for images. How can I go ahead and split this dataset into a training set and a testing set using pandas (80/20)?
Also, once that is done how would I also split each of those sets so that I can define x (all columns except the last one), and y (the last column)?
I've imported my file using:
dataset = pd.read_csv('example.csv', header=None, sep=',')
Thanks
By using pandas. DataFrame. to_csv() method you can write/save/export a pandas DataFrame to CSV File. By default to_csv() method export DataFrame to a CSV file with comma delimiter and row index as the first column.
Generally, scikit-learn works on any numeric data stored as numpy arrays or scipy sparse matrices. Other types that are convertible to numeric arrays such as pandas DataFrame are also acceptable.
Using the read_csv() function from the pandas package, you can import tabular data from CSV files into pandas dataframe by specifying a parameter value for the file name (e.g. pd. read_csv("filename. csv") ).
Python3. Method #3: Using the csv module: One can directly import the csv files using the csv module and then create a data frame using that csv file.
I'd recommend using sklearn's train_test_split
from sklearn.model_selection import train_test_split
# for older versions import from sklearn.cross_validation
# from sklearn.cross_validation import train_test_split
X, y = dataset.iloc[:, :-1], dataset.iloc[:, -1]
kwargs = dict(test_size=0.2, random_state=1)
X_train, X_test, y_train, y_test = train_test_split(X, y, **kwargs)
You can try this.
Sperating target class from the rest:
pixel_values = Dataset[df.columns[0:len(Dataset.axes[1])-1]]
target_class = Dataset[df.columns[len(Dataset.axes[1])-1:]]
Now to create test and training samples:
I would just use numpy's randn:
mask = np.random.rand(len(pixel_values )) < 0.8
train = pixel_values [mask]
test = pixel_values [~msk]
Now you have traning and test samples in train and test with 80:20 ratio.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With