Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Rank over partition from postgresql in elasticsearch

We are facing a problem with migration a large data set into elasticsearch from postgres (backup or whatever).

We have schema similar like this

+---------------+--------------+------------+-----------+
|    user_id    |  created_at  |  latitude  | longitude |
+---------------+--------------+------------+-----------+
|       5       |  23.1.2015   |    12.49   |    20.39  |
+---------------+--------------+------------+-----------+
|       2       |  23.1.2015   |    12.42   |    20.32  |
+---------------+--------------+------------+-----------+
|       2       |  24.1.2015   |    12.41   |    20.31  |
+---------------+--------------+------------+-----------+
|       5       |  25.1.2015   |    12.45   |    20.32  |
+---------------+--------------+------------+-----------+
|       1       |  23.1.2015   |    12.43   |    20.34  |
+---------------+--------------+------------+-----------+
|       1       |  24.1.2015   |    12.42   |    20.31  |
+---------------+--------------+------------+-----------+

And we are able to find a latest position by created_at thanks to rank function in SQL

... WITH locations AS ( 
select user_id, lat, lon, rank() over (partition by user_id order by created_at) as r
FROM locations)
SELECT user_id, lat, lon FROM locations WHERE r = 1

and the result is only newest created locations for each user:

+---------------+--------------+------------+-----------+
|    user_id    |  created_at  |  latitude  | longitude |
+---------------+--------------+------------+-----------+
|       2       |  24.1.2015   |    12.41   |    20.31  |
+---------------+--------------+------------+-----------+
|       5       |  25.1.2015   |    12.45   |    20.32  |
+---------------+--------------+------------+-----------+
|       1       |  24.1.2015   |    12.42   |    20.31  |
+---------------+--------------+------------+-----------+

After we import the data into elasticsearch, our document model looks like:

{
   "location" : { "lat" : 12.45, "lon" : 46.84 },
   "user_id"  : 5,
   "created_at" : "2015-01-24T07:55:20.606+00:00"
}
etc...

I am looking for alternatives for this SQL query in elasticsearch query, I think it must be possible, but i did not find how yet.

like image 442
Jan Strnádek Avatar asked Sep 26 '22 17:09

Jan Strnádek


1 Answers

You can achieve this using field collapsing clubbed with inner_hits.

{
    "collapse": {
        "field": "user_id",
        "inner_hits": {
            "name": "order by created_at",
            "size": 1,
            "sort": [
                {
                    "created_at": "desc"
                }
            ]
        }
    },
}

Detailed Article: https://blog.francium.tech/sql-window-function-partition-by-in-elasticsearch-c2e3941495b6

like image 123
bragboy Avatar answered Oct 03 '22 15:10

bragboy