Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fastest way to grow a numpy numeric array

Requirements:

  • I need to grow an array arbitrarily large from data.
  • I can guess the size (roughly 100-200) with no guarantees that the array will fit every time
  • Once it is grown to its final size, I need to perform numeric computations on it, so I'd prefer to eventually get to a 2-D numpy array.
  • Speed is critical. As an example, for one of 300 files, the update() method is called 45 million times (takes 150s or so) and the finalize() method is called 500k times (takes total of 106s) ... taking a total of 250s or so.

Here is my code:

def __init__(self):     self.data = []  def update(self, row):     self.data.append(row)  def finalize(self):     dx = np.array(self.data) 

Other things I tried include the following code ... but this is waaaaay slower.

def class A:     def __init__(self):         self.data = np.array([])      def update(self, row):         np.append(self.data, row)      def finalize(self):         dx = np.reshape(self.data, size=(self.data.shape[0]/5, 5)) 

Here is a schematic of how this is called:

for i in range(500000):     ax = A()     for j in range(200):          ax.update([1,2,3,4,5])     ax.finalize()     # some processing on ax 
like image 797
fodon Avatar asked Aug 20 '11 18:08

fodon


People also ask

How can I make NumPy array faster?

By explicitly declaring the "ndarray" data type, your array processing can be 1250x faster. This tutorial will show you how to speed up the processing of NumPy arrays using Cython. By explicitly specifying the data types of variables in Python, Cython can give drastic speed increases at runtime.

Is appending to NumPy array faster than list?

NumPy Arrays Are NOT Always Faster Than Lists " append() " adds values to the end of both lists and NumPy arrays.

Is NP append faster?

Python numpy append() function is used to merge two arrays. This function returns a new array and the original array remains unchanged. ---------- So it sounds like List. append(new_value) is faster.

Is NumPy concatenate faster than append?

In general it is better/faster to iterate or append with lists, and apply the np. array (or concatenate) just once. appending to a list is fast; much faster than making a new array.


1 Answers

I tried a few different things, with timing.

import numpy as np 
  1. The method you mention as slow: (32.094 seconds)

    class A:      def __init__(self):         self.data = np.array([])      def update(self, row):         self.data = np.append(self.data, row)      def finalize(self):         return np.reshape(self.data, newshape=(self.data.shape[0]/5, 5)) 
  2. Regular ol Python list: (0.308 seconds)

    class B:      def __init__(self):         self.data = []      def update(self, row):         for r in row:             self.data.append(r)      def finalize(self):         return np.reshape(self.data, newshape=(len(self.data)/5, 5)) 
  3. Trying to implement an arraylist in numpy: (0.362 seconds)

    class C:      def __init__(self):         self.data = np.zeros((100,))         self.capacity = 100         self.size = 0      def update(self, row):         for r in row:             self.add(r)      def add(self, x):         if self.size == self.capacity:             self.capacity *= 4             newdata = np.zeros((self.capacity,))             newdata[:self.size] = self.data             self.data = newdata          self.data[self.size] = x         self.size += 1      def finalize(self):         data = self.data[:self.size]         return np.reshape(data, newshape=(len(data)/5, 5)) 

And this is how I timed it:

x = C() for i in xrange(100000):     x.update([i]) 

So it looks like regular old Python lists are pretty good ;)

like image 56
Owen Avatar answered Oct 10 '22 19:10

Owen