Best way to index database table data in Solr?

Question

I have a table with around 100,000 rows at the moment. I want to index the data in this table in a Solr Index.

So the naive method would be to:

Get all the rows
For each row: convert to a SolrDocument and add each document to a request
Once all rows are converted then post the request

Some problems with this approach that I can think of are:

Loading too much data (the content of the whole table) in to memory
POSTing a big request

However, some advantages:

Only one request to the Database
Only one POST request to Solr

The approach is not scalable, I see that since as the table grows so will the memory requirements and the size of the POST request. I need to perhaps take n number of rows, process them, then take the next n?

I'm wondering if any one has any advice about how to best implement this?

(ps. I did search the site but I didn't find any questions that were similar to this.)

Thanks.

nfechner · Accepted Answer

If you want to balance between POSTing all documents at once and doing one POST per document you could use a queue to collect documents and run a separate thread that sends documents once you have collected enough. This way you can manage the memory vs. request time problem.

C0deAttack · Answer

I used the suggestion from nikhil500:

DIH does support many transformers. You can also write custom transformers. I will recommend using DIH if possible - I think it will need the least amount of coding and will be faster than POSTing the documents. – nikhil500 Feb 6 at 17:42

Best way to index database table data in Solr?

Tags:

database

indexing

solr

batch-processing

C0deAttack

2 Answers

nfechner

C0deAttack

Recent Activity

Donate For Us

Best way to index database table data in Solr?

Tags:

database

indexing

solr

batch-processing

C0deAttack

2 Answers

nfechner

C0deAttack

Related questions

Recent Activity

Donate For Us