Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: Vectorize list lookup

I have sensor data like this:

{"Time":1541203508.45,"Tc":25.4,"Hp":33}
{"Time":1541203508.45,"Tc":25.2,"Hp":32}
{"Time":1541203508.45,"Tc":25.1,"Hp":31}
{"Time":1541203508.45,"Tc":25.2,"Hp":33}

I'm doing a lot of list lookups in a for loop like this:

output={}
for i,data in enumerate(sensor_data):
    output[i]={}
    output[i]['H']=['V_Dry','Dry','Normal','Humid','V_Humid','ERR']([sensor_data[i]['Hp'])%20]
    #.... And so on for temp etc

Is there some way to vectorize this if I converted it to a numpy/pandas datatype? Like, if I split the sections into temp, humidity etc, is there a python method that would apply this 'mask' kind of thing on it?

Is map my only option to speed it up?

like image 641
azazelspeaks Avatar asked Jun 09 '26 04:06

azazelspeaks


1 Answers

First attempt

I suggest you first convert your data into a numpy array:

import numpy as np
data = [{"Time":1541203508.45,"Tc":25.4,"Hp":33},
{"Time":1541203508.45,"Tc":25.2,"Hp":32},
{"Time":1541203508.45,"Tc":25.1,"Hp":31},
{"Time":1541203508.45,"Tc":25.2,"Hp":33}]
np_data = np.asarray([list(element.values()) for element in data])

Now the third column is humidity in your example. Let's now define a map for that:

def convert_number_to_hr(value):
    hr_names = ['V_Dry','Dry','Normal','Humid','V_Humid','ERR']
    return hr_names[int(value//20)]

This uses your predefined names in steps of 20%. Now let's apply the map:

hr_humidity = map(convert_number_to_hr, np_data[:,2])

This is a generator. You can iterate through it or convert it to a list via list(hr_humidity).

This reports a speed of

%timeit hr_humidity = map(convert_number_to_hr, np_data[:,2])
515 ns ± 2.25 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

If you apply list(..) this time grows to

%timeit hr_humidity = list(map(convert_number_to_hr, np_data[:,2]))
5.62 µs ± 18.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

You can now use the same procedure for everything else in your dataset.

Second attempt

I tried to do this fully vectorized as you asked in your comment. I came up with:

def same_but_pure_numpy(arr):
    arr = arr.astype(int)//20
    hr_names = np.asarray(['V_Dry','Dry','Normal','Humid','V_Humid','ERR'])
    return hr_names[arr]

This reports a speed of

%timeit a = same_but_pure_numpy(np_data[:,2])
11.5 µs ± 151 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

so the map version seems to be faster.

Third attempt

EDIT: Okay I had my first try with pandas:

import pandas as pd
data = [{"Time":1541203508.45,"Tc":25.4,"Hp":33},
{"Time":1541203508.45,"Tc":25.2,"Hp":32},
{"Time":1541203508.45,"Tc":25.1,"Hp":31},
{"Time":1541203508.45,"Tc":25.2,"Hp":33}]
df = pd.DataFrame(data)
def convert_number_to_hr(value):
    hr_names = ['V_Dry','Dry','Normal','Humid','V_Humid','ERR']
    return hr_names[int(value//20)]

The result is as expected, but it seems to consume much time:

%timeit new = df["Hp"].map(convert_number_to_hr)
110 µs ± 569 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
like image 139
user8408080 Avatar answered Jun 11 '26 17:06

user8408080



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!