Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Invert the random choice of keys in a numpy array

I have a huge np.array called arr with N values and choose 10% of these values randomly by:

choice=random.sample(range(N), int(N*percent))  # percent has values 0-1
newarr=arr[choice]

N could be over 2 million values.

Actually I also need an array with the other 90% of values. So at the moment I use the following which is very slow:

def buildRevChoice(choice, nevents):
        revChoice=[]
        for i in range(N):
            if not i in choice:
                revChoice.append(i)
        return revChoice

Can you think of a method to fasten this up?

like image 719
user575736 Avatar asked Apr 09 '14 11:04

user575736


Video Answer


2 Answers

You can just random.shuffle the list, then split it as you like.

def choice(N, percent):
    tmp = range(N)
    random.shuffle(tmp)
    cut = int(N * percent)
    return tmp[:cut], tmp[cut:]

And you'll get your two lists, the first containing the chosen ones and the second containing the rest.

like image 86
Sufian Latif Avatar answered Sep 30 '22 13:09

Sufian Latif


If you're OK with the memory overhead of a mask array, this seems to be faster than selecting the other values by index and retains the order of the elements in are. Here is what I got with timings from IPython notebook:

N = 2000000
arr = random.random(N)
percent = 0.10

My solution:

%% timeit
choice = random.choice(N, N*percent)
mask = zeros_like(arr, bool)   
mask[choice] = True
newarr = arr[mask]
revchoice = arr[~mask]

10 loops, best of 3: 18.1 ms per loop

0605002's solution:

tmp = range(N)
random.shuffle(tmp)
cut = int(N * percent)
newarr, revchoice = tmp[:cut], tmp[cut:]

1 loops, best of 3: 603 ms per loop

like image 25
chthonicdaemon Avatar answered Sep 30 '22 15:09

chthonicdaemon