Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Index mongoDB with ElasticSearch

I already have MongoDB and installed Elasticsearch with Mongoriver. So I set up my river:

$ curl -X PUT localhost:9200/_river/database_test/_meta -d '{
  "type": "mongodb",
  "mongodb": {
    "servers": [
      {
        "host": "127.0.0.1",
        "port": 27017
      }
    ],
    "options": {
      "secondary_read_preference": true
    },
    "db": "database_test",
    "collection": "event"
  },
  "index": {
    "name": "database_test",
    "type": "event"
  }
}'

I simply want to get events that have country:Canada so I try:

$ curl -XGET 'http://localhost:9200/database_test/_search?q=country:Canada'

And I get:

{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 0,
    "max_score": null,
    "hits": []
  }
}

I am searching the web and I read that I should first index my collection with Elasticsearch (lost the link). Should I index my Mongodb? What should I do to get results from an existing MongoDB collection?

like image 596
Diolor Avatar asked Nov 19 '13 19:11

Diolor


People also ask

Can you use MongoDB with Elasticsearch?

Using MongoDB and Elasticsearch in combination can be a useful approach in these scenarios since it is then very easy to store the time series data in multiple indices (such as daily or monthly indices) and search those indices' data via aliases. Elasticsearch supports incremental data backups using the _snapshot API.

How do I sync data between MongoDB and Elasticsearch?

mongo-connector needs mongo to run in replica-set mode, sync data in mongo to the target then tails the mongo oplog, keeping up with operations in MongoDB in real-time. It needs a package named “elastic2_doc_manager” to write data to ES. mongo-connector copies your documents from MongoDB to your target system.

Why use Elasticsearch instead of MongoDB?

Elasticsearch is built for search and provides advanced data indexing capabilities. For data analysis, it operates alongside Kibana, and Logstash to form the ELK stack. MongoDB is an open-source NoSQL database management program, which can be used to manage large amounts of data in a distributed architecture.


1 Answers

The mongodb river relies on the operations log of MongoDB to index documents, so it is a requirement that you create your mongo database as a replica set. I assume that you're missing it, so when you create the river, the initial import sees nothing to index. I am also assuming that you're on Linux and you have a handle on the shell cli tools, so try this:

Follow these steps:

  • Make sure that the mapper-attachments Elasticsearch plugins is also installed
  • Make a backup of your database with mongodump
  • edit mongodb.conf (usually in /etc/mongodb.conf, but varies on how you installed it) and add the line:

    replSet = rs0

    "rs0" is the name of the replicaset, it can be whatever you like.

  • restart your mongo and then log in its console. Type:

    rs.initiate()
    rs.slaveOk()

The prompt will change to rs0:PRIMARY>

  • Now create your river just as you did in the question and restore your database with mongorestore. Elasticsearch should index your documents.

I recomend using this plugin: http://mobz.github.io/elasticsearch-head/ to navigate your indexes and rivers and make sure your data got indexed.

If that doesnt work, please post which versions you are using for the mongodb-river-plugin, elasticsearch and mongodb.

like image 175
Rodrigo Del C. Andrade Avatar answered Sep 27 '22 20:09

Rodrigo Del C. Andrade