Python: Vectorize list lookup

Question

I have sensor data like this:

{"Time":1541203508.45,"Tc":25.4,"Hp":33}
{"Time":1541203508.45,"Tc":25.2,"Hp":32}
{"Time":1541203508.45,"Tc":25.1,"Hp":31}
{"Time":1541203508.45,"Tc":25.2,"Hp":33}

I'm doing a lot of list lookups in a for loop like this:

output={}
for i,data in enumerate(sensor_data):
    output[i]={}
    output[i]['H']=['V_Dry','Dry','Normal','Humid','V_Humid','ERR']([sensor_data[i]['Hp'])%20]
    #.... And so on for temp etc

Is there some way to vectorize this if I converted it to a numpy/pandas datatype? Like, if I split the sections into temp, humidity etc, is there a python method that would apply this 'mask' kind of thing on it?

Is map my only option to speed it up?

user8408080 · Accepted Answer

First attempt

I suggest you first convert your data into a numpy array:

import numpy as np
data = [{"Time":1541203508.45,"Tc":25.4,"Hp":33},
{"Time":1541203508.45,"Tc":25.2,"Hp":32},
{"Time":1541203508.45,"Tc":25.1,"Hp":31},
{"Time":1541203508.45,"Tc":25.2,"Hp":33}]
np_data = np.asarray([list(element.values()) for element in data])

Now the third column is humidity in your example. Let's now define a map for that:

def convert_number_to_hr(value):
    hr_names = ['V_Dry','Dry','Normal','Humid','V_Humid','ERR']
    return hr_names[int(value//20)]

This uses your predefined names in steps of 20%. Now let's apply the map:

hr_humidity = map(convert_number_to_hr, np_data[:,2])

This is a generator. You can iterate through it or convert it to a list via list(hr_humidity).

This reports a speed of

%timeit hr_humidity = map(convert_number_to_hr, np_data[:,2])
515 ns ± 2.25 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

If you apply list(..) this time grows to

%timeit hr_humidity = list(map(convert_number_to_hr, np_data[:,2]))
5.62 µs ± 18.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

You can now use the same procedure for everything else in your dataset.

Second attempt

I tried to do this fully vectorized as you asked in your comment. I came up with:

def same_but_pure_numpy(arr):
    arr = arr.astype(int)//20
    hr_names = np.asarray(['V_Dry','Dry','Normal','Humid','V_Humid','ERR'])
    return hr_names[arr]

This reports a speed of

%timeit a = same_but_pure_numpy(np_data[:,2])
11.5 µs ± 151 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

so the map version seems to be faster.

Third attempt

EDIT: Okay I had my first try with pandas:

import pandas as pd
data = [{"Time":1541203508.45,"Tc":25.4,"Hp":33},
{"Time":1541203508.45,"Tc":25.2,"Hp":32},
{"Time":1541203508.45,"Tc":25.1,"Hp":31},
{"Time":1541203508.45,"Tc":25.2,"Hp":33}]
df = pd.DataFrame(data)
def convert_number_to_hr(value):
    hr_names = ['V_Dry','Dry','Normal','Humid','V_Humid','ERR']
    return hr_names[int(value//20)]

The result is as expected, but it seems to consume much time:

%timeit new = df["Hp"].map(convert_number_to_hr)
110 µs ± 569 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Python: Vectorize list lookup

Tags:

python

vectorization

azazelspeaks

1 Answers

user8408080

Recent Activity

Donate For Us

Python: Vectorize list lookup

Tags:

python

vectorization

azazelspeaks

1 Answers

user8408080

Related questions

Recent Activity

Donate For Us