I am new in python and I am currenlt struggly to do simple things with pandas. I would like to apply the same function to each item of a given dataset but using a time-dependent parameter.
I am working with pandas DataFrame with timestamps as index.
Let's say :
a(i,j) is ith element in column j in a dataframe A (timestamp/index = i and column = j)
b(i) is the ith element in a dataframe B (with a single column)
I want to compute:
c(i, j) = fct(a(i, j), b(i))
where fct is a function with two arguments z = fct(x, y)
I wrote a code that does it correcly but it is likely not optimal (very slow). For the example I just used a simple function fct (but in reallity it is more complex)
Inputs:
pandas.DataFrame with index=timestamps and several columnspandas.DataFrame with 1 column containing the time-dependent parameterHere is the code:
# p.concat is required as timestamps are not identical in df_data & df_parameters
import numpy as np
import pandas as p
temp = p.concat([df_data, df_parameter], join='inner', axis=1)
index = temp.index
np_data = temp[nacelleWindSpeeds.columns].values
np_parameter = temp[airDensity.columns].values
import math
def fct(x, y):
return math.pow(x, y)
def test(np_data, np_parameter):
np_result = np.empty(np_data.shape, dtype=float)
it = np.nditer(np_data, flags=['multi_index'])
while not it.finished:
np_result[it.multi_index] = fct(it[0].item(),
np_parameter[it.multi_index[0]][0])
it.iternext()
df_final=p.DataFrame(data=np_result, index=index)
return df_final
final=test(np_data, np_parameter)
final.to_csv(r'C:\temp\test.csv', sep=';')
Here is some example data:
df_data
01/03/2010 00:00 ; 9 ; 5 ; 7
01/03/2010 00:10 ; 9 ; 1 ; 4
01/03/2010 00:20 ; 5 ; 3 ; 8
01/03/2010 00:30 ; 7 ; 7 ; 1
01/03/2010 00:40 ; 8 ; 2 ; 3
01/03/2010 00:50 ; 0 ; 3 ; 4
01/03/2010 01:00 ; 4 ; 3 ; 2
01/03/2010 01:10 ; 6 ; 2 ; 2
01/03/2010 01:20 ; 6 ; 8 ; 5
01/03/2010 01:30 ; 7 ; 7 ; 0
df_parameter
01/03/2010 00:00 ; 2
01/03/2010 00:10 ; 5
01/03/2010 00:20 ; 2
01/03/2010 00:30 ; 3
01/03/2010 00:40 ; 0
01/03/2010 00:50 ; 2
01/03/2010 01:00 ; 4
01/03/2010 01:10 ; 3
01/03/2010 01:20 ; 3
01/03/2010 01:30 ; 1
final
01/03/2010 00:00 ; 81 ; 25 ; 49
01/03/2010 00:10 ; 59049 ; 1 ; 1024
01/03/2010 00:20 ; 25 ; 9 ; 64
01/03/2010 00:30 ; 343 ; 343 ; 1
01/03/2010 00:40 ; 1 ; 1 ; 1
01/03/2010 00:50 ; 0 ; 9 ; 16
01/03/2010 01:00 ; 256 ; 81 ; 16
01/03/2010 01:10 ; 216 ; 8 ; 8
01/03/2010 01:20 ; 216 ; 512 ; 125
01/03/2010 01:30 ; 7 ; 7 ; 0
Thank you very very much in advance for your help,
Patrick
Don't know if this is the optimal way, but this is simpler and should be more efficient as it uses vectorized functions for the calculations:
def func(x, y):
return x ** y
data = pd.read_csv('data.dat', sep=';', index_col=0, parse_dates=True,
header=None, names='abc')
para = pd.read_csv('parameter.dat', sep=';', index_col=0, parse_dates=True,
header=None, names=['para'])
for col in data:
data['%s_result' % col] = func(data[col], para.para)
print data
results in
a b c a_result b_result c_result
2010-01-03 00:00:00 9 5 7 81 25 49
2010-01-03 00:10:00 9 1 4 59049 1 1024
2010-01-03 00:20:00 5 3 8 25 9 64
2010-01-03 00:30:00 7 7 1 343 343 1
2010-01-03 00:40:00 8 2 3 1 1 1
2010-01-03 00:50:00 0 3 4 0 9 16
2010-01-03 01:00:00 4 3 2 256 81 16
2010-01-03 01:10:00 6 2 2 216 8 8
2010-01-03 01:20:00 6 8 5 216 512 125
2010-01-03 01:30:00 7 7 0 7 7 0
If your real function is more complex you should even try to vectorize it or use numpy.vectorize() as the next best solution.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With