Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

trying to split list by percentage

Tags:

python

I'm trying to split a list by taking in a percentage and randomly grabbing elements out of the main list into 2 other lists. The trainingSet is the left over list. I'm running into a problem when i'm generating a random index to pick from. This code works with a small list but when I work with (len(rawRatings) = 1000) it does not work.

error:

  File "/Applications/WingIDE.app/Contents/MacOS/src/debug/tserver/_sandbox.py", line 1, in <module>
      # Used internally for debug sandbox under external interpreter
    File "/Applications/WingIDE.app/Contents/MacOS/src/debug/tserver/_sandbox.py", line 29, in partitionRankings
    File "/Users/rderickson9/anaconda/lib/python2.7/random.py", line 241, in randint
return self.randrange(a, b+1)
    File "/Users/rderickson9/anaconda/lib/python2.7/random.py", line 217, in randrange
      raise ValueError, "empty range for randrange() (%d,%d, %d)" % (istart, istop, width)
  ValueError: empty range for randrange() (0,0, 0)

raw Ratings is a list, and testPercent is a float.

ex.

rawRatings = [(123,432,4),(23,342,3),(23,123,5),(234,523,3),(34,23,1), (12,32,4)]
testPercent = .2
partitionRankings(rawRatings, testPercent)
[(23,123,5),(234,523,3),(34,23,1),(123,432,4),(12,32,4)],[(23,342,3)]


def partitionRankings(rawRatings, testPercent):
    testSet = []
    trainingSet = []
    howManyNumbers = int(round(testPercent*len(rawRatings)))
    declineRandom = 0
    while True:
        if declineRandom == howManyNumbers:
                    break        
        randomIndex = random.randint(0, (len(rawRatings)-1)-declineRandom)
        testSetTuple = rawRatings[randomIndex]
        del rawRatings[randomIndex]
        testSet.append(testSetTuple)

        declineRandom = declineRandom + 1
    trainingSet = rawRatings[:]
    return (trainingSet), (testSet)

I don't want to choose the same random Index. Once, I choose one, I don't want to randomly select it again. I don't think this is correct. This is the part I'm having trouble with.

randomIndex = random.randint(0, (len(rawRatings)-1)-declineRandom)
like image 398
user3491255 Avatar asked Apr 25 '14 17:04

user3491255


People also ask

How do you split a list into evenly sized chunks?

The easiest way to split list into equal sized chunks is to use a slice operator successively and shifting initial and final position by a fixed number.

How do I separate a list of items?

Usually, we use a comma to separate three items or more in a list. However, if one or more of these items contain commas, then you should use a semicolon, instead of a comma, to separate the items and avoid potential confusion.

Can you use the split function with a list?

A split function can be used to split strings with the help of the occurrence of a character. A split function can be used to split strings in the form of a list.


1 Answers

Since order of the training set does not matter, you can do this with an entirely different strategy - shuffle the list of rawRatings, and then take the first howManyNumbers elements as your test set, and the rest as your training set.

import random

def partitionRankings(rawRatings, testPercent):
    howManyNumbers = int(round(testPercent*len(rawRatings)))
    shuffled = rawRatings[:]
    random.shuffle(shuffled)
    return shuffled[howManyNumbers:], shuffled[:howManyNumbers]

As for why your code as you have it doesn't work, the problem is, as you guessed, with this line:

randomIndex = random.randint(0, (len(rawRatings)-1)-declineRandom)

The problem is with -declineRandom.

  • Every time you go through the loop, you remove the entry that you picked, so even if you were to get the same index again you would not be picking the same element.
  • If you didn't remove the element from the list on each iteration, this would not prevent picking the same element twice - this only prevents you from picking any of the last declineRandom elements.
    • You'd have to move the elements to the end of the list at each iteration.
  • Because you delete elements and then don't replace them at the end of the list, len(rawRatings) shrinks while declineRandom grows.
    • If you have a list of 1000 items and try to put 600 in the test set, when you have 550 items in the test set you be trying to get a random int that is greater than or equal to zero and less than or equal to (450-1)-550=-101. Obviously you wouldn't actually get to that point, but hopefully it makes the issue clear.
like image 148
Rob Watts Avatar answered Oct 19 '22 09:10

Rob Watts