I have a csv file without headers which I'm importing into python using pandas. The last column is the target class, while the rest of the columns are pixel values for images. How can I go ahead and split this dataset into a training set and a testing set using pandas (80/20)? Also, once that is done how would I also split each of those sets so that I can define x (all columns except the last one), and y (the last column)? I've imported my file using: <pre class="prettyprint"><code>dataset = pd.read_csv('example.csv', header=None, sep=',') </code></pre> Thanks

You can try this. Sperating target class from the rest: <pre class="prettyprint"><code>pixel_values = Dataset[df.columns[0:len(Dataset.axes[1])-1]] target_class = Dataset[df.columns[len(Dataset.axes[1])-1:]] </code></pre> Now to create test and training samples: I would just use numpy's randn: <pre class="prettyprint"><code> mask = np.random.rand(len(pixel_values )) < 0.8 train = pixel_values [mask] test = pixel_values [~msk] </code></pre> Now you have traning and test samples in train and test with 80:20 ratio.

Preparing CSV file data for Scikit-Learn Using Pandas?

Tags:

python

pandas

csv

scikit-learn

I have a csv file without headers which I'm importing into python using pandas. The last column is the target class, while the rest of the columns are pixel values for images. How can I go ahead and split this dataset into a training set and a testing set using pandas (80/20)?

Also, once that is done how would I also split each of those sets so that I can define x (all columns except the last one), and y (the last column)?

I've imported my file using:

dataset = pd.read_csv('example.csv', header=None, sep=',')

Thanks

229

asked Mar 28 '16 05:03

KingPolygon

2 Answers

I'd recommend using sklearn's train_test_split

from sklearn.model_selection import train_test_split
# for older versions import from sklearn.cross_validation
# from sklearn.cross_validation import train_test_split
X, y = dataset.iloc[:, :-1], dataset.iloc[:, -1]
kwargs = dict(test_size=0.2, random_state=1)
X_train, X_test, y_train, y_test = train_test_split(X, y, **kwargs)

160

answered Sep 27 '22 23:09

ayhan

You can try this.

Sperating target class from the rest:

pixel_values = Dataset[df.columns[0:len(Dataset.axes[1])-1]]
target_class = Dataset[df.columns[len(Dataset.axes[1])-1:]]

Now to create test and training samples:

I would just use numpy's randn:

 mask = np.random.rand(len(pixel_values )) < 0.8
 train = pixel_values [mask]
 test = pixel_values [~msk]

Now you have traning and test samples in train and test with 80:20 ratio.

answered Sep 28 '22 01:09

Randhawa

Related questions
                            
                                urllib.request.urlopen return bytes, but I cannot decode it [duplicate]
                            
                                Using BeautifulSoup to find tag with two specific styles
                            
                                Matplotlib: Overriding "ggplot" default style properties
                            
                                Python unit test: testcase class with own constructor fails in standard library [duplicate]
                            
                                print(foo, end="") not working in terminal
                            
                                Pymongo's update_one() returns UpdateResult with AttributeError
                            
                                Flask WTForms: how do I get a form value back into Python?
                            
                                How to find elements with two possible class names by XPath?
                            
                                Using lambda and strftime on dates when there are null values (Pandas)
                            
                                How can I get the cursor's position in an ANSI terminal?
                            
                                "Worksheet range names does not exist" KeyError in openpyxl
                            
                                Why does setup.py usually not have a shebang line?
                            
                                How to add dictionary to PyEnchant?
                            
                                "Matrix is not positive definite" error in scipy.cluster.vq.kmeans2
                            
                                Jupyter install fails on Mac
                            
                                Creating a float64 Variable in tensorflow
                            
                                pip install hyperopt and hyperas fail
                            
                                Get the last 10000 lines of a csv file
                            
                                clipping a voronoi diagram python
                            
                                install from a git subdirectory with pip requirements

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With