I'm trying to split a list by taking in a percentage and randomly grabbing elements out of the main list into 2 other lists. The trainingSet is the left over list. I'm running into a problem when i'm generating a random index to pick from. This code works with a small list but when I work with (len(rawRatings) = 1000) it does not work.
error:
File "/Applications/WingIDE.app/Contents/MacOS/src/debug/tserver/_sandbox.py", line 1, in <module>
# Used internally for debug sandbox under external interpreter
File "/Applications/WingIDE.app/Contents/MacOS/src/debug/tserver/_sandbox.py", line 29, in partitionRankings
File "/Users/rderickson9/anaconda/lib/python2.7/random.py", line 241, in randint
return self.randrange(a, b+1)
File "/Users/rderickson9/anaconda/lib/python2.7/random.py", line 217, in randrange
raise ValueError, "empty range for randrange() (%d,%d, %d)" % (istart, istop, width)
ValueError: empty range for randrange() (0,0, 0)
raw Ratings is a list, and testPercent is a float.
ex.
rawRatings = [(123,432,4),(23,342,3),(23,123,5),(234,523,3),(34,23,1), (12,32,4)]
testPercent = .2
partitionRankings(rawRatings, testPercent)
[(23,123,5),(234,523,3),(34,23,1),(123,432,4),(12,32,4)],[(23,342,3)]
def partitionRankings(rawRatings, testPercent):
testSet = []
trainingSet = []
howManyNumbers = int(round(testPercent*len(rawRatings)))
declineRandom = 0
while True:
if declineRandom == howManyNumbers:
break
randomIndex = random.randint(0, (len(rawRatings)-1)-declineRandom)
testSetTuple = rawRatings[randomIndex]
del rawRatings[randomIndex]
testSet.append(testSetTuple)
declineRandom = declineRandom + 1
trainingSet = rawRatings[:]
return (trainingSet), (testSet)
I don't want to choose the same random Index. Once, I choose one, I don't want to randomly select it again. I don't think this is correct. This is the part I'm having trouble with.
randomIndex = random.randint(0, (len(rawRatings)-1)-declineRandom)
The easiest way to split list into equal sized chunks is to use a slice operator successively and shifting initial and final position by a fixed number.
Usually, we use a comma to separate three items or more in a list. However, if one or more of these items contain commas, then you should use a semicolon, instead of a comma, to separate the items and avoid potential confusion.
A split function can be used to split strings with the help of the occurrence of a character. A split function can be used to split strings in the form of a list.
Since order of the training set does not matter, you can do this with an entirely different strategy - shuffle the list of rawRatings, and then take the first howManyNumbers
elements as your test set, and the rest as your training set.
import random
def partitionRankings(rawRatings, testPercent):
howManyNumbers = int(round(testPercent*len(rawRatings)))
shuffled = rawRatings[:]
random.shuffle(shuffled)
return shuffled[howManyNumbers:], shuffled[:howManyNumbers]
As for why your code as you have it doesn't work, the problem is, as you guessed, with this line:
randomIndex = random.randint(0, (len(rawRatings)-1)-declineRandom)
The problem is with -declineRandom
.
declineRandom
elements.
len(rawRatings)
shrinks while declineRandom
grows.
(450-1)-550=-101
. Obviously you wouldn't actually get to that point, but hopefully it makes the issue clear.If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With