Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can Apache Solr Handle TeraByte Large Data

I am an apache solr user about a year. I used solr for simple search tools but now I want to use solr with 5TB of data. I assume that 5TB data will be 7TB when solr index it according to filter that I use. And then I will add nearly 50MB of data per hour to the same index.

1- Are there any problem using single solr server with 5TB data. (without shards)

  • a- Can solr server answers the queries in an acceptable time

  • b- what is the expected time for commiting of 50MB data on 7TB index.

  • c- Is there an upper limit for index size.

2- what are the suggestions that you offer

  • a- How many shards should I use

  • b- Should I use solr cores

  • c- What is the committing frequency you offered. (is 1 hour OK)

3- are there any test results for this kind of large data


There is no available 5TB data, I just want to estimate what will be the result.

Note: You can assume that hardware resourses are not a problem.

like image 459
Mustafa Avatar asked Jan 12 '12 14:01

Mustafa


1 Answers

if your sizes are for text, rather than binary files (whose text would be usually much less), then I don't think you can pretend to do this in a single machine.

This sounds a lot like Logly and they use SolrCloud to handle such amount of data.

ok if all are rich documents then total text size to index will be much smaller (for me its about 7% of my starting size). Anyway, even with that decreased amount, you still have too much data for a single instance I think.

like image 83
Persimmonium Avatar answered Sep 23 '22 02:09

Persimmonium