I have a huge np.array called arr with N values and choose 10% of these values randomly by:
choice=random.sample(range(N), int(N*percent)) # percent has values 0-1
newarr=arr[choice]
N could be over 2 million values.
Actually I also need an array with the other 90% of values. So at the moment I use the following which is very slow:
def buildRevChoice(choice, nevents):
revChoice=[]
for i in range(N):
if not i in choice:
revChoice.append(i)
return revChoice
Can you think of a method to fasten this up?
You can just random.shuffle
the list, then split it as you like.
def choice(N, percent):
tmp = range(N)
random.shuffle(tmp)
cut = int(N * percent)
return tmp[:cut], tmp[cut:]
And you'll get your two lists, the first containing the chosen ones and the second containing the rest.
If you're OK with the memory overhead of a mask array, this seems to be faster than selecting the other values by index and retains the order of the elements in are
. Here is what I got with timings from IPython notebook:
N = 2000000
arr = random.random(N)
percent = 0.10
My solution:
%% timeit
choice = random.choice(N, N*percent)
mask = zeros_like(arr, bool)
mask[choice] = True
newarr = arr[mask]
revchoice = arr[~mask]
10 loops, best of 3: 18.1 ms per loop
0605002's solution:
tmp = range(N)
random.shuffle(tmp)
cut = int(N * percent)
newarr, revchoice = tmp[:cut], tmp[cut:]
1 loops, best of 3: 603 ms per loop
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With