Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Split a text file into multiple files by percentage for test and train

Tags:

python

pandas

I have a large txt file with 50000+ line, how can I randomly split them into 70% for training, 20% for test, and 10% for dev.

result expecting : train.txt, test.txt, dev.txt

like image 388
Kimi Shui Avatar asked Nov 19 '25 07:11

Kimi Shui


1 Answers

I found this code much simpler.

## allocating train, test and validate datasets
import random 

fin = open('unique.txt', 'rb') 
f75out = open("train.txt", 'wb') 
f125aout = open("test.txt", 'wb')
f125bout = open("validate.txt", 'wb')

for line in fin: 
  r = random.random() 
  if (0.0 <=  r <= 0.75): 
    f75out.write(line) 
  elif (0.75 < r <= 0.875): 
    f125aout.write(line) 
  else:
    f125bout.write(line)
fin.close() 
f75out.close() 
f125aout.close() 
f125bout.close() 
like image 58
MuneshSingh Avatar answered Nov 21 '25 20:11

MuneshSingh



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!