<ul> <li>I have a numpy matrix with shape of (4601, 58).</li> <li>I want to split the matrix randomly as per 60%, 20%, 20% split based on number of rows</li> <li>This is for Machine Learning task I need</li> <li>Is there a numpy function that randomly selects rows? </li> </ul>

If you want to randomly select rows, you could just use <code>random.sample</code> from the standard Python library: <pre class="prettyprint"><code>import random population = range(4601) # Your number of rows choice = random.sample(population, k) # k being the number of samples you require </code></pre> <code>random.sample</code> samples without replacement, so you don't need to worry about repeated rows ending up in <code>choice</code>. Given a numpy array called <code>matrix</code>, you can select the rows by slicing, like this: <code>matrix[choice]</code>. Of, course, <code>k</code> can be equal to the number of total elements in the population, and then <code>choice</code> would contain a random ordering of the indices for your rows. Then you can partition <code>choice</code> as you please, if that's all you need.

Numpy: How to randomly split/select an matrix into n-different matrices

4 Answers

you can use numpy.random.shuffle

import numpy as np

N = 4601
data = np.arange(N*58).reshape(-1, 58)
np.random.shuffle(data)

a = data[:int(N*0.6)]
b = data[int(N*0.6):int(N*0.8)]
c = data[int(N*0.8):]

140

answered Oct 14 '22 03:10

HYRY

A complement to HYRY's answer if you want to shuffle consistently several arrays x, y, z with same first dimension: x.shape[0] == y.shape[0] == z.shape[0] == n_samples.

You can do:

rng = np.random.RandomState(42)  # reproducible results with a fixed seed
indices = np.arange(n_samples)
rng.shuffle(indices)
x_shuffled = x[indices]
y_shuffled = y[indices]
z_shuffled = z[indices]

And then proceed with the split of each shuffled array as in HYRY's answer.

answered Oct 14 '22 04:10

ogrisel

If you want to randomly select rows, you could just use random.sample from the standard Python library:

import random

population = range(4601) # Your number of rows
choice = random.sample(population, k) # k being the number of samples you require

random.sample samples without replacement, so you don't need to worry about repeated rows ending up in choice. Given a numpy array called matrix, you can select the rows by slicing, like this: matrix[choice].

Of, course, k can be equal to the number of total elements in the population, and then choice would contain a random ordering of the indices for your rows. Then you can partition choice as you please, if that's all you need.

answered Oct 14 '22 04:10

Ricardo Cárdenes

Since you need it for machine learning, here is a method I wrote:

import numpy as np

def split_random(matrix, percent_train=70, percent_test=15):
    """
    Splits matrix data into randomly ordered sets 
    grouped by provided percentages.

    Usage:
    rows = 100
    columns = 2
    matrix = np.random.rand(rows, columns)
    training, testing, validation = \
    split_random(matrix, percent_train=80, percent_test=10)

    percent_validation 10
    training (80, 2)
    testing (10, 2)
    validation (10, 2)

    Returns:
    - training_data: percentage_train e.g. 70%
    - testing_data: percent_test e.g. 15%
    - validation_data: reminder from 100% e.g. 15%
    Created by Uki D. Lucas on Feb. 4, 2017
    """

    percent_validation = 100 - percent_train - percent_test

    if percent_validation < 0:
        print("Make sure that the provided sum of " + \
        "training and testing percentages is equal, " + \
        "or less than 100%.")
        percent_validation = 0
    else:
        print("percent_validation", percent_validation)

    #print(matrix)  
    rows = matrix.shape[0]
    np.random.shuffle(matrix)

    end_training = int(rows*percent_train/100)    
    end_testing = end_training + int((rows * percent_test/100))

    training = matrix[:end_training]
    testing = matrix[end_training:end_testing]
    validation = matrix[end_testing:]
    return training, testing, validation

# TEST:
rows = 100
columns = 2
matrix = np.random.rand(rows, columns)
training, testing, validation = split_random(matrix, percent_train=80, percent_test=10) 

print("training",training.shape)
print("testing",testing.shape)
print("validation",validation.shape)

print(split_random.__doc__)

training (80, 2)
testing (10, 2)
validation (10, 2)

answered Oct 14 '22 04:10

Uki D. Lucas

Related questions
                            
                                How to get a list of the name of every open window?
                            
                                How to correct unstable loss and accuracy during training? (binary classification)
                            
                                How to start a Uvicorn + FastAPI in background when testing with PyTest
                            
                                How can I install pip for Python2.7 in Ubuntu 20.04
                            
                                Python, beyond the basics [closed]
                            
                                Why is there a need to explicitly delete the sys.exc_info() traceback?
                            
                                Parse a string with a date to a datetime object [duplicate]
                            
                                Python Exception Propagation
                            
                                Music Recognition and Signal Processing
                            
                                Using arbitrary methods or attributes as fields on Django ModelAdmin objects?
                            
                                Does Python's reduce() short circuit?
                            
                                Reportlab - how to introduce line break if the paragraph is too long for a line
                            
                                testing if a numpy array is symmetric?
                            
                                Python interp1d vs. UnivariateSpline
                            
                                replace all "\" with "\\" python
                            
                                How to install MatPlotLib on Mac 10.7 in virtualenv
                            
                                permission change of files in python
                            
                                3D/4D graphics with Python and wxPython?
                            
                                Install "scientific python" environment: OS X 10.7 + Numpy + Scipy + Matplotlib
                            
                                Optparser-print Usage Help when no argument is given

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Numpy: How to randomly split/select an matrix into n-different matrices

Tags:

python

random

numpy

scipy

scikits

daydreamer

People also ask

4 Answers

HYRY

ogrisel

Ricardo Cárdenes

Uki D. Lucas

Recent Activity

Donate For Us