Preferred method of indexing bulk data into ElasticSearch?

Tags:

I've been looking at ElasticSearch as solution get some better search and analytics functionality at my company. All of our data is in SQL Server at the moment and I've successfully installed the JDBC River and gotten some test data into ES.

Rivers seem like they can be deprecated in future releases and the JDBC river is maintained by a third party. And Logstash doesn't seem to support indexing from SQL Server yet (don't know if its a planned feature).

So for my situation where I want to move data from SQL Server to ElasticSearch, what's the preferred method of indexing data and maintaining the index as SQL gets updated with new data?

From the linked thread:

We recommend that you own your indexing process out-of-band from ES and make sure it scales with your needs.

I'm not quite sure where to start with this. Is it on me to use one of the APIs ES provides?

814

asked Mar 06 '14 22:03

Cuthbert

2 Answers

We use RabbitMQ to pipe data from SQL Server to ES. That way Rabbit takes care of the queuing and processing.

As a note, we can run over 4000 records per second from SQL into Rabbit. We do a bit more processing before putting the data into ES but we still insert into ES at over 1000 records per second. Pretty damn impressive on both ends. Rabbit and ES are both awesome!

136

answered Oct 19 '22 21:10

jhilden

There are a lot of things that you can do. You can put your data in rabbitmq or redis, but your main problem is staying up to date. I guess you should look into an event based application. But if you really only have the sql server as a datasource you could work with timestamps and a query that checks for updates. Depending on the size of your database you can also just reindex the complete dataset.

Using events or the query based solution, you can push these updates to elasticsearch, probably using the bulk api.

The good part about a custom solution like this is that you can think about your mapping. This is important if you really want to do something smart with your data.

answered Oct 19 '22 21:10

Jettro Coenradie

Related questions
                            
                                How to use activerecord-sqlserver-adapter with TinyTDS *and* an Integrated Security connection on Windows *without* saving a password in plain text
                            
                                Why can't SQL Server tell me which column is causing the error
                            
                                How To perform a SQL Query to DataTable Operation That Can Be Cancelled
                            
                                ADO.Net SQLCommand.ExecuteReader() slows down or hangs
                            
                                SQL Server TRY...CATCH with XACT_STATE
                            
                                Trouble with Entity Framework Linq Query: runs instantly in SSMS and 8-10s in EF LINQ
                            
                                CTE with recursion - row_number() aggregated
                            
                                Service based database vs SQL Server Compact vs LocalDB?
                            
                                How to update last record with second lat results [SQLServer]
                            
                                Unit Testing TSQL
                            
                                Stored proc running 30% slower through Java versus running directly on database
                            
                                Do DB indexes take same amount of disc space as column data?
                            
                                Optimal storage of data structure for fast lookup and persistence
                            
                                Using System Environment variables in a sql script
                            
                                Monitoring ASP.NET and SQL Server for Security
                            
                                Best practice between these two queries
                            
                                Advantage to using SQL Server Reporting Services? [closed]
                            
                                How do I reconstruct a historical view?
                            
                                Entity Framework connecting to SQLEXPRESS not SQL Compact
                            
                                SQL Server, Select statement inside a 'case'

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Preferred method of indexing bulk data into ElasticSearch?

Tags:

sql-server

elasticsearch

elasticsearch-jdbc-river

Cuthbert

People also ask

2 Answers

jhilden

Jettro Coenradie

Recent Activity

Donate For Us