I have a list of dictionaries, with keys 'a', 'n', 'o', 'u'. Is there a way to speed up this calculation, for instance with NumPy? There are tens of thousands of items in the list.
The data is drawn from a database, so I must live with that it's in the form of a list of dictionaries originally.
x = n = o = u = 0
for entry in indata:
x += (entry['a']) * entry['n'] # n - number of data points
n += entry['n']
o += entry['o']
u += entry['u']
loops += 1
average = int(round(x / n)), n, o, u
I doubt this will be much faster, but I suppose it's a candidate for timeit...
from operator import itemgetter
x = n = o = u = 0
items = itemgetter('a','n','o','u')
for entry in indata:
A,N,O,U = items(entry)
x += A*N # n - number of data points
n += N
o += O #don't know what you're doing with O or U, but I'll leave them
u += U
average = int(round(x / n)), n, o, u
At the very least, it saves a lookup of entry['n'] since I've now saved it to a variable
You could try something like this:
mean_a = np.sum(np.array([d['a'] for d in data]) * np.array([d['n'] for d in data])) / len(data)
import numpy as np
from operator import itemgetter
from pandas import *
data=[]
for i in range(100000):
data.append({'a':np.random.random(), 'n':np.random.random(), 'o':np.random.random(), 'u':np.random.random()})
def func1(data):
x = n = o = u = 0
items = itemgetter('a','n','o','u')
for entry in data:
A,N,O,U = items(entry)
x += A*N # n - number of data points
n += N
o += O #don't know what you're doing with O or U, but I'll leave them
u += U
average = int(round(x / n)), n, o, u
return average
def func2(data):
mean_a = np.sum(np.array([d['a'] for d in data]) * np.array([d['n'] for d in data])/len(data)
return (mean_a,
np.sum([d['n'] for d in data]),
np.sum([d['o'] for d in data]),
np.sum([d['u'] for d in data])
)
def func3(data):
dframe = DataFrame(data)
return np.sum((dframe["a"]*dframe["n"])) / dframe.shape[0], np.sum(dframe["n"]), np.sum(dframe["o"]), np.sum(dframe["u"])
In [3]: %timeit func1(data)
10 loops, best of 3: 59.6 ms per loop
In [4]: %timeit func2(data)
10 loops, best of 3: 138 ms per loop
In [5]: %timeit func3(data)
10 loops, best of 3: 129 ms per loop
If you are doing other operations on the data, I would definitely look into using the Pandas package. It's DataFrame object is a nice match to the list of dictionaries that you are working with. I think that the majority of the overhead is IO operations of getting the data into numpy arrays or DataFrame objects.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With