Optimizing tasks to reduce CPU in a trading application

Question

I have designed a trading application that handles customers stocks investment portfolio.

I am using two datastore kinds:

Stocks - Contains unique stock name and its daily percent change.
UserTransactions - Contains information regarding a specific purchase of a stock made by a user : the value of the purchase along with a reference to Stock for the current purchase.

db.Model python modules:

class Stocks (db.Model):
stockname = db.StringProperty(multiline=True) 
dailyPercentChange=db.FloatProperty(default=1.0) 

class UserTransactions (db.Model): 
buyer = db.UserProperty() 
value=db.FloatProperty() 
stockref = db.ReferenceProperty(Stocks)

Once an hour I need to update the database: update the daily percent change in Stocks and then update the value of all entities in UserTransactions that refer to that stock.

The following python module iterates over all the stocks, update the dailyPercentChange property, and invoke a task to go over all UserTransactions entities which refer to the stock and update their value:

Stocks.py

# Iterate over all stocks in datastore
for stock in Stocks.all():
   # update daily percent change in datastore
   db.run_in_transaction(updateStockTxn, stock.key()) 
   # create a task to update all user transactions entities referring to this stock
   taskqueue.add(url='/task', params={'stock_key': str(stock.key(), 'value' : self.request.get ('some_val_for_stock') }) 

def updateStockTxn(stock_key):
   #fetch the stock again - necessary to avoid concurrency updates
   stock = db.get(stock_key)
   stock.dailyPercentChange= data.get('some_val_for_stock') # I get this value from outside
   ... some more calculations here ...
   stock.put()

Task.py (/task)

# Amount of transaction per task
amountPerCall=10
stock=db.get(self.request.get("stock_key")) 
# Get all user transactions which point to current stock
user_transaction_query=stock.usertransactions_set
cursor=self.request.get("cursor") 
if cursor: 
    user_transaction_query.with_cursor(cursor) 

# Spawn another task if more than 10 transactions are in datastore
transactions = user_transaction_query.fetch(amountPerCall) 
if len(transactions)==amountPerCall: 
    taskqueue.add(url='/task', params={'stock_key': str(stock.key(), 'value' : self.request.get ('some_val_for_stock'), 'cursor': user_transaction_query.cursor()  })

# Iterate over all transaction pointing to stock and update their value
for transaction in transactions: 
   db.run_in_transaction(updateUserTransactionTxn, transaction.key()) 

def updateUserTransactionTxn(transaction_key): 
   #fetch the transaction again - necessary to avoid concurrency updates
   transaction = db.get(transaction_key)
   transaction.value= transaction.value* self.request.get ('some_val_for_stock')
   db.put(transaction)

The problem:

Currently the system works great, but the problem is that it is not scaling well… I have around 100 Stocks with 300 User Transactions, and I run the update every hour. In the dashboard, I see that the task.py takes around 65% of the CPU (Stock.py takes around 20%-30%) and I am using almost all of the 6.5 free CPU hours given to me by app engine. I have no problem to enable billing and pay for additional CPU, but the problem is the scaling of the system… Using 6.5 CPU hours for 100 stocks is very poor.

I was wondering, given the requirements of the system as mentioned above, if there is a better and more efficient implementation (or just a small change that can help with the current implemntation) than the one presented here.

Thanks!!

Joel

Nick Johnson · Accepted Answer

There are several obvious improvements to be made:

You should use a keys_only query in the first snippet: since you don't actually refer to the properties of the stock object at any point, there's no point in retrieving it. You may as well retrieve only the key.
You can add tasks in bulk using the Queue object's .add method, documented here. This is more efficient than adding tasks individually.
Your tasks chain new ones every 10 transactions, but tasks can run for up to 10 minutes, and 10 datastore transactions are likely to take no more than a second or two. Instead, set a timer at the beginning of your request, and check it each time around the loop, aborting and chaining the next task when you get close to the 10 minute limit.
If you expect to iterate over a large number of entities, use .fetch and cursors, rather than iterating; iterating fetches in small batches of 20 entities.
In the individual entity update, you're again doing a regular query, but only using the key. Do a keys_only query instead.
Is the task the only thing that will update UserTransaction entities after they're originally written? If so, you can skip the transaction and update them in batches.

Finally, I'd suggest an overall refactoring: instead of starting a new task for each stock, run the outer loop inside a task, with the timer mentioned above. When you chain the next task, use cursors to pass the current state and pick up where you left off.

The only other thing to consider is if there's some way you can restructure your data to avoid the need for so many updates. Can you, for instance, make the UserTransaction entities reference some value in the Stock entities, so that you can calculate their actual value at runtime, and you only need to update the single Stock entity with the change?

Optimizing tasks to reduce CPU in a trading application

Tags:

google-app-engine

Joel

1 Answers

Nick Johnson

Recent Activity

Donate For Us

Optimizing tasks to reduce CPU in a trading application

Tags:

google-app-engine

Joel

1 Answers

Nick Johnson

Related questions

Recent Activity

Donate For Us