I am just wondering if we could achieve some RDBMS capabilities in lucene.
Example: 1) I have 10,000 project documents (pdf files) which have to be indexed with their content to make them available for search. 2) Every document is related to a SINGLE PROJECT. The project can contain details like project name, number, start date, end date, location, type etc.
I have to search in the contents of the pdf files for a given keyword, but while displaying the results I want to display the project meta data as mentioned in point (2).
My idea is to associate a field called projectId with each pdf file while indexing. Once we get that, we will fire search again for getting project meta data.
This way we could avoid duplicated data. Also, if we want to update the project meta data we will end up updating at a SINGLE PLACE only. Otherwise if we store this meta data with all the pdf doument indexes, we will end up updating all of the documents, which is not the way I am looking for.
please advise.
Lucene is not a database — as I mentioned earlier, it's just a Java library.
Apache Solr is a subproject of Apache Lucene, which is the indexing technology behind most recently created search and index technology. Solr is a search engine at heart, but it is much more than that. It is a NoSQL database with transactional support.
Amazon and MongoDB both use Lucene every day, and the most important use case is no doubt application search, in which the engine is primarily used by humans.
Both are better suited for developing a search engine and both are based on Lucene.
If I understand you correctly, you have two questions:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With