I'm currently facing a head-scratching problem, I am working with a large data set (when I say large, I mean billions of rows of data) and I am caught between speed and scalability.
I can store the billions of rows of data in the database, but my application needs to constantly check whether a new row of data exists in the dataset, if not, insert it, otherwise, retrieve it.
If I were to use a database solution, I estimate each call to the database to retrieve a row of data to be 10ms (optimistic estimate), I need to retrieve about 800k records for each file that I process in my application, that means (10ms x 800k = 2.22 hours)
per file to process. That timespan is too long to analyse and process 1 file, considering the amount of time required to retrieve a row of data from the database will increase when the database grow to billions and billions of rows.
I have also thought of storing a List
or HashSet
in the local memory to compare and retrieve, but it is not going to work out as I will not be able to store billions of records (objects) in the memory.
Pls advice on what I should do for my situation.
Edit: Oh ya, I forgotten to state that I have already implemented a semi-cache, once a record is retrieved, it will be cached in the memory, so if the same record needs to be retrieved again, it will be retrieved from the memory instead, but I face the same problem, I will reach a point in time where the memory can no longer fit any more cached data.
Ideally if you are playing with a large number of data then you have to make sure that you do not run out of resources while processing your data. However you just need to find a reasonable way to increase the utilization of your resources.
I would definitely go with database because that is the best known way to query and storing of the data in most optimal way. You didn't mention what exactly your application does, so I can just give you general opinions about how I would do in such scenario;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With