Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PyTables batch get and update

I have daily stock data as an HDF5 file created using PyTables. I would like to get a group of rows, process it as an array and then write it back to disk (update rows) using PyTables. I couldn't figure out a way to do this cleanly. Could you please let me know what will be the best way to accomplish this?

My data:

Symbol, date, price, var1, var2
abcd, 1, 2.5, 12, 12.5
abcd, 2, 2.6, 11, 10.2
abcd, 3, 2.45, 11, 10.3
defg, 1,12.34, 19.1, 18.1
defg, 2, 11.90, 19.5, 18.2
defg, 3, 11.75, 21, 20.9
defg, 4, 11.74, 22.2, 21.4

I would like to read the rows that correspond to each symbol as an array, do some processing and update the fields var1 and var2. I know all the symbols in advance so I can loop through them. I tried something like this:

rows_array = [row.fetch_all_fields() for row in table.where('Symbol == "abcd"')]

I would like to pass rows_array to another function which will compute the values for var1 and var2 and update it for each record. Please note that var1, var2 are like moving averages so I will not be able to calculate them inside an iterator and hence the need for the entire set of rows to be an array.

After I calculate whatever I need using rows_array, I am not sure how to write it back to the data i.e., update the rows with the new calculated values. When updating the entire table , I use this:

 table.cols.var1[:] = calc_something(rows_array)

However, when I want to update only a portion of the table, I am not the best way to do it. I guess I can re-run the 'where' condition and then update each row based on my calcs but that's seems like a waste of time rescanning the table.

Your suggestions are appreciated...

Thanks, -e

like image 712
Ecognium Avatar asked Feb 18 '11 02:02

Ecognium


1 Answers

If I understand well, the next should do what you want:

condition = 'Symbol == "abcd"'
indices = table.getWhereList(condition)  # get indices
rows_array = table[indices]  # get values
new_rows = compute(rows_array)   # compute new values
table[indices] = new_rows  # update the indices with new values

Hope this helps

like image 57
FrancescAlted Avatar answered Sep 18 '22 23:09

FrancescAlted