Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Data storage for financial analysis

I am building system to analyze large quantities of financial data regarding securities trading prices. A large challenge in this is determining what storage method to use for the data given that the data will be in the 10's of terrabytes. There will be many queries on the data such as taking averages, calculating standard deviations, and sums filtered by multiple columns such as price, time, volume, etc. Join statements aren't a requisite, but would be nice to have.

Right now, I am looking at infobright community edition, monetdb, and greenplum community edition for evaluation purposes. They seem great so far, but for more advanced features, some of each are required are not available in some of these editions (using multiple servers, insert/update statements, etc).

What solutions would you use for this situation, and benefits does it provide over the alternatives? Being cost effective is a major plus. If I must pay for a data warehousing solution I will, but I would much rather avoid it and take the open-source/community edition route if possible.

like image 950
user396404 Avatar asked Nov 05 '22 03:11

user396404


1 Answers

Infobright delivers fast query performance with no tuning, no projections and no indexes on large volumes of data. On data loading, i have seen instances where 80TB of data per hour can load, over 12,000 inserts per second.

How does it work?

  1. Column Orientation vs Row Orientation
  2. Data Packs plus Compression average of 20:1
  3. Knowledge Grid - Sub second response on query
  4. Granular Engine, built on top of mysql architecture

I would still suggest you consider looking into the enterprise licensing, but you can certainly evaluate the community edition and test your performance and data loading needs against it.

Disclaimer: author is affiliated with Infobright.

like image 137
Craig Trombly Avatar answered Nov 15 '22 06:11

Craig Trombly