Numpy split with percentage on a matrix

Question

I have issues to understand the following coding and I am new to python:

data_a, data_b, data_C = np.split(original_data.sample(frac=1, random_state=1729), 
                               [int(0.7 * len(original_data)), int(0.9*len(original_data))])

so my original data set has a complete of 38000 rows. After this split method the data_a has 26600 rows. Now data_b has 7600 rows, data_c has 3800 rows. So I do get that 70% of original_data will be 26600 rows. But why does data_b has 7600 rows and data_c 3800. I read the documentation about that split method and from what I understand the coding I would have suggested that for the rest of 30% data from my initial 38000 rows, 90% will split into data_b that would be 10260 rows. Not 7600 rows.

Venkatachalam · Accepted Answer

You have do it sequentially, if you want split the remaining 30% into 90-10. Try this!

data_a, remaining_data = np.split(original_data.sample(frac=1, random_state=1729), 
                                   [int(0.7 * len(original_data))])
data_b, data_C = np.split(remaining_data,[int(0.9 * len(remaining_data))])

data_a.shape, data_b.shape, data_C.shape

output:

((26600,), (10260,), (1140,))

Alex · Answer

the splits percentages there are relative to the original dataset, so if you want data_b to be 90% of the 30% left after the first split you need to do something like this

data_a, data_b, data_C = np.split(original_data.sample(frac=1, random_state=1729), [int(0.7 * len(original_data)), int(0.97*len(original_data))])

that is because you specify the split points rather than the ratios of result data sets

Numpy split with percentage on a matrix

Tags:

python

python-3.x

MaradonaAtCoding

2 Answers

Venkatachalam

Alex

Recent Activity

Donate For Us

Numpy split with percentage on a matrix

Tags:

python

python-3.x

MaradonaAtCoding

2 Answers

Venkatachalam

Alex

Related questions

Recent Activity

Donate For Us