Requirements:
Here is my code:
def __init__(self): self.data = [] def update(self, row): self.data.append(row) def finalize(self): dx = np.array(self.data)
Other things I tried include the following code ... but this is waaaaay slower.
def class A: def __init__(self): self.data = np.array([]) def update(self, row): np.append(self.data, row) def finalize(self): dx = np.reshape(self.data, size=(self.data.shape[0]/5, 5))
Here is a schematic of how this is called:
for i in range(500000): ax = A() for j in range(200): ax.update([1,2,3,4,5]) ax.finalize() # some processing on ax
By explicitly declaring the "ndarray" data type, your array processing can be 1250x faster. This tutorial will show you how to speed up the processing of NumPy arrays using Cython. By explicitly specifying the data types of variables in Python, Cython can give drastic speed increases at runtime.
NumPy Arrays Are NOT Always Faster Than Lists " append() " adds values to the end of both lists and NumPy arrays.
Python numpy append() function is used to merge two arrays. This function returns a new array and the original array remains unchanged. ---------- So it sounds like List. append(new_value) is faster.
In general it is better/faster to iterate or append with lists, and apply the np. array (or concatenate) just once. appending to a list is fast; much faster than making a new array.
I tried a few different things, with timing.
import numpy as np
The method you mention as slow: (32.094 seconds)
class A: def __init__(self): self.data = np.array([]) def update(self, row): self.data = np.append(self.data, row) def finalize(self): return np.reshape(self.data, newshape=(self.data.shape[0]/5, 5))
Regular ol Python list: (0.308 seconds)
class B: def __init__(self): self.data = [] def update(self, row): for r in row: self.data.append(r) def finalize(self): return np.reshape(self.data, newshape=(len(self.data)/5, 5))
Trying to implement an arraylist in numpy: (0.362 seconds)
class C: def __init__(self): self.data = np.zeros((100,)) self.capacity = 100 self.size = 0 def update(self, row): for r in row: self.add(r) def add(self, x): if self.size == self.capacity: self.capacity *= 4 newdata = np.zeros((self.capacity,)) newdata[:self.size] = self.data self.data = newdata self.data[self.size] = x self.size += 1 def finalize(self): data = self.data[:self.size] return np.reshape(data, newshape=(len(data)/5, 5))
And this is how I timed it:
x = C() for i in xrange(100000): x.update([i])
So it looks like regular old Python lists are pretty good ;)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With