Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Indexing data from postgres to solr/elasticsearch

What is the best way to index constantly changing data in a PostgreSQL database to a Solr/Elasticsearch database?

I have a postgres database on AWS RDS and i want to perform complex search on it. However the data i will query against is constantly changing with very high writes/ updates. So i am not sure how i should transfer the data to the solr/ elasticsearch efficiently and reliably.

Thanks for the help

like image 574
Al Hennessey Avatar asked Dec 24 '15 17:12

Al Hennessey


People also ask

Does Elasticsearch work with Postgres?

ZomboDB allows you to use the power and scalability of Elasticsearch directly from Postgres. You don't have to manage transactions between Postgres and Elasticsearch, asynchronous indexing pipelines, complex reindexing processes, or multiple data-access code paths -- ZomboDB does it all for you.

Does PostgreSQL automatically index?

PostgreSQL automatically creates a unique index when a unique constraint or primary key is defined for a table. The index covers the columns that make up the primary key or unique constraint (a multicolumn index, if appropriate), and is the mechanism that enforces the constraint.

Is Elasticsearch faster than Postgres?

No matter how well PostgreSQL does on its full-text searches, Elasticsearch is designed to search in enormous texts and documents(or records). And the more size you want to search in, the more Elasticsearch is better than PostgreSQL in performance.


2 Answers

At the risk of someone marking this question as a duplicate, here's the link to setting up postgres-to-elasticsearch in another StackOverflow thread. There's also this blog post on Atlassian that also talks about how to get real time updates from PostgreSQL into ElasticSearch.

The Atlassian thread, for the tl;dr crowd, uses stored PGS procedures to copy updated/inserted data to a staging table, then separately processes the staging table. It's a nice approach that would work for either ES or Solr. Unfortunately, it's a roll-your-own solution, unless you are familiar with Clojure.

like image 86
scooter me fecit Avatar answered Sep 18 '22 05:09

scooter me fecit


In case of Solr, a general approach is to use Data Import Handler (DIH for short). Config the full-import & delta-import sql properly, where delta import import data from database that changes since last import judging via timestamps (so, u need design schema with proper timestamps).

The timing of delta-import, has 2 styles which could be used separately or combined:

  • Do delta-import with a timer. (e.g every 5 minutes)
  • After each update in database, make a call to delta-import.

Refer to https://cwiki.apache.org/confluence/display/solr/Uploading+Structured+Data+Store+Data+with+the+Data+Import+Handler for DIH detail.

like image 24
user218867 Avatar answered Sep 21 '22 05:09

user218867