Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Mocking and Unit Testing Solr and Lucene Index

We need control of the data in the production solr index and we need it to be compatible with new development. Ideally, we'd like to mock the index on local machines, query with it solr and write unit tests to query it for quicker iterations.

RamDirectory is used in another question to do something similar but the question is from 2 years back. This example appears to do just that (using FSDirectory instead of RamDirectory). Are these the right approaches to this problem? Are there better ways to do this?

We'd like to write tests like:

setup mock index;
query mock index;
assert(stuff that should be true);
teardown mock index;

EDIT: Additional details:

Our thought was we would build an index, have a simple way of adding documents without needing the indexer and the rest of the system, except perhaps a local database that we could keep in version control. In the past we generated an index and when incompatibilities arose, we regenerated it.

If we re-index, we're adding in a lot of overhead, and mocking the indexer doesn't seem like a good option given that our indexer contains a lot of data processing logic (like adding data to searchable fields from a db). Our indexer connects to an external db so we'd need to support that too. We could have a local test database as stated above which has little no overhead.

Once we have a test db, we need to build an index and then we could go off the second link above. The question becomes how do we build an index really quickly for testing, say of the size 1000 documents.

The problem with this is we then need to keep our local db schema in sync with the production schema. The production schema changes often enough that this is a problem. We'd like to have a test infrastructure that's flexible enough to handle this- the approach as of now is just rebuild the database each time which is slow and pisses off other people!

like image 680
nflacco Avatar asked Jul 27 '11 19:07

nflacco


1 Answers

If you are using Solr I wouldn't even bother with mocking or emulating (ie don't change its config).

Instead write an integration test that sets up your solr index. The setting up would be to just to index the data like you normally would. You will probably want your developers to run their own solr.

I wouldn't worry that much about speed because solr indexes incredible fast (100,000 documents in less than 30 seconds for our environment... infact the bottle neck is pulling the data from the database).

So really your mock index should just be a small subset of production data that you will index into solr (you can do this once for each TestCase class with @BeforeClass).

EDIT (based on your Edits):

I'll tell you how we do it (and how I have seen others do it):

We have a development schema/db and production schema/db. When developers are working on stuff they just make a copy of the "build machines" development database and restore it locally. This database is much smaller than the production db and is ideal for testing. Your production db should no be that much different than your development db schema wise (make smaller changes and release more often if it is the case.)

like image 140
Adam Gent Avatar answered Sep 28 '22 13:09

Adam Gent