I have a table with around 100,000 rows at the moment. I want to index the data in this table in a Solr Index.
So the naive method would be to:
Some problems with this approach that I can think of are:
However, some advantages:
The approach is not scalable, I see that since as the table grows so will the memory requirements and the size of the POST request. I need to perhaps take n
number of rows, process them, then take the next n
?
I'm wondering if any one has any advice about how to best implement this?
(ps. I did search the site but I didn't find any questions that were similar to this.)
Thanks.
If you want to balance between POSTing all documents at once and doing one POST per document you could use a queue to collect documents and run a separate thread that sends documents once you have collected enough. This way you can manage the memory vs. request time problem.
I used the suggestion from nikhil500:
DIH does support many transformers. You can also write custom transformers. I will recommend using DIH if possible - I think it will need the least amount of coding and will be faster than POSTing the documents. – nikhil500 Feb 6 at 17:42
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With