Am using Lucene API in my web portal which is going to have 1000s of concurrent users. Our web server will call Lucene API which will be sitting on an app server.We plan to use 2 app servers for load balancing. Given this, what should be our strategy for replicating lucene indexes on the 2nd app server?any tips please?
Replication can balance the load on Directory Server in the following ways: By spreading search activities across several servers. By dedicating specific servers to specific tasks or applications.
Some of the common load balancing methods are as follows: Round robin -- In this method, an incoming request is routed to each available server in a sequential manner. Weighted round robin -- Here, a static weight is preassigned to each server and is used with the round robin method to route an incoming request.
There are two primary approaches to load balancing. Dynamic load balancing uses algorithms that take into account the current state of each server and distribute traffic accordingly. Static load balancing distributes traffic without making these adjustments.
ZooKeeper is used for High Availability, but not as a Load Balancer exactly. High Availability means, you don't want to loose your single point of contact i.e. your master node. If one master goes down there should be some else who can take care and maintain the same state.
You could use solr, which contains built in replication. This is possibly the best and easiest solution, since it probably would take quite a lot of work to implement your own replication scheme.
That said, I'm about to do exactly that myself, for a project I'm working on. The difference is that since we're using PHP for the frontend, we've implemented lucene in a socket server that accepts queries and returns a list of db primary keys. My plan is to push changes to the server and store them in a queue, where I'll first store them into the the memory index, and then flush the memory index to disk when the load is low enough.
Still, it's a complex thing to do and I'm set on doing quite a lot of work before we have a stable final solution that's reliable enough.
From experience, Lucene should have no problem scaling to thousands of users. That said, if you're only using your second App server for load balancing and not for fail over situations, you should be fine hosting Lucene on only one of those servers and accessing it via NDS (if you have a unix environment) or shared directory (in windows environment) from the second server.
Again, this is dependent on your specific situation. If you're talking about having millions (5 or more) of documents in your index and needing your lucene index to be failoverable, you may want to look into Solr or Katta.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With