Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Index replication and Load balancing

Am using Lucene API in my web portal which is going to have 1000s of concurrent users. Our web server will call Lucene API which will be sitting on an app server.We plan to use 2 app servers for load balancing. Given this, what should be our strategy for replicating lucene indexes on the 2nd app server?any tips please?

like image 329
user40907 Avatar asked Mar 21 '09 23:03

user40907


People also ask

How do information replication strategy affect load balancing?

Replication can balance the load on Directory Server in the following ways: By spreading search activities across several servers. By dedicating specific servers to specific tasks or applications.

What are the two methods of load balancing technique?

Some of the common load balancing methods are as follows: Round robin -- In this method, an incoming request is routed to each available server in a sequential manner. Weighted round robin -- Here, a static weight is preassigned to each server and is used with the round robin method to route an incoming request.

What are the different methods of load balancing?

There are two primary approaches to load balancing. Dynamic load balancing uses algorithms that take into account the current state of each server and distribute traffic accordingly. Static load balancing distributes traffic without making these adjustments.

Does ZooKeeper do load balancing?

ZooKeeper is used for High Availability, but not as a Load Balancer exactly. High Availability means, you don't want to loose your single point of contact i.e. your master node. If one master goes down there should be some else who can take care and maintain the same state.


2 Answers

You could use solr, which contains built in replication. This is possibly the best and easiest solution, since it probably would take quite a lot of work to implement your own replication scheme.

That said, I'm about to do exactly that myself, for a project I'm working on. The difference is that since we're using PHP for the frontend, we've implemented lucene in a socket server that accepts queries and returns a list of db primary keys. My plan is to push changes to the server and store them in a queue, where I'll first store them into the the memory index, and then flush the memory index to disk when the load is low enough.

Still, it's a complex thing to do and I'm set on doing quite a lot of work before we have a stable final solution that's reliable enough.

like image 81
Emil H Avatar answered Oct 16 '22 16:10

Emil H


From experience, Lucene should have no problem scaling to thousands of users. That said, if you're only using your second App server for load balancing and not for fail over situations, you should be fine hosting Lucene on only one of those servers and accessing it via NDS (if you have a unix environment) or shared directory (in windows environment) from the second server.

Again, this is dependent on your specific situation. If you're talking about having millions (5 or more) of documents in your index and needing your lucene index to be failoverable, you may want to look into Solr or Katta.

like image 36
dustyburwell Avatar answered Oct 16 '22 14:10

dustyburwell