I have sensor data like this:
{"Time":1541203508.45,"Tc":25.4,"Hp":33}
{"Time":1541203508.45,"Tc":25.2,"Hp":32}
{"Time":1541203508.45,"Tc":25.1,"Hp":31}
{"Time":1541203508.45,"Tc":25.2,"Hp":33}
I'm doing a lot of list lookups in a for loop like this:
output={}
for i,data in enumerate(sensor_data):
output[i]={}
output[i]['H']=['V_Dry','Dry','Normal','Humid','V_Humid','ERR']([sensor_data[i]['Hp'])%20]
#.... And so on for temp etc
Is there some way to vectorize this if I converted it to a numpy/pandas datatype? Like, if I split the sections into temp, humidity etc, is there a python method that would apply this 'mask' kind of thing on it?
Is map my only option to speed it up?
First attempt
I suggest you first convert your data into a numpy array:
import numpy as np
data = [{"Time":1541203508.45,"Tc":25.4,"Hp":33},
{"Time":1541203508.45,"Tc":25.2,"Hp":32},
{"Time":1541203508.45,"Tc":25.1,"Hp":31},
{"Time":1541203508.45,"Tc":25.2,"Hp":33}]
np_data = np.asarray([list(element.values()) for element in data])
Now the third column is humidity in your example. Let's now define a map for that:
def convert_number_to_hr(value):
hr_names = ['V_Dry','Dry','Normal','Humid','V_Humid','ERR']
return hr_names[int(value//20)]
This uses your predefined names in steps of 20%. Now let's apply the map:
hr_humidity = map(convert_number_to_hr, np_data[:,2])
This is a generator. You can iterate through it or convert it to a list via list(hr_humidity).
This reports a speed of
%timeit hr_humidity = map(convert_number_to_hr, np_data[:,2])
515 ns ± 2.25 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
If you apply list(..) this time grows to
%timeit hr_humidity = list(map(convert_number_to_hr, np_data[:,2]))
5.62 µs ± 18.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
You can now use the same procedure for everything else in your dataset.
Second attempt
I tried to do this fully vectorized as you asked in your comment. I came up with:
def same_but_pure_numpy(arr):
arr = arr.astype(int)//20
hr_names = np.asarray(['V_Dry','Dry','Normal','Humid','V_Humid','ERR'])
return hr_names[arr]
This reports a speed of
%timeit a = same_but_pure_numpy(np_data[:,2])
11.5 µs ± 151 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
so the map version seems to be faster.
Third attempt
EDIT: Okay I had my first try with pandas:
import pandas as pd
data = [{"Time":1541203508.45,"Tc":25.4,"Hp":33},
{"Time":1541203508.45,"Tc":25.2,"Hp":32},
{"Time":1541203508.45,"Tc":25.1,"Hp":31},
{"Time":1541203508.45,"Tc":25.2,"Hp":33}]
df = pd.DataFrame(data)
def convert_number_to_hr(value):
hr_names = ['V_Dry','Dry','Normal','Humid','V_Humid','ERR']
return hr_names[int(value//20)]
The result is as expected, but it seems to consume much time:
%timeit new = df["Hp"].map(convert_number_to_hr)
110 µs ± 569 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With