Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to sort horizontal partitioned data

I have a telco billing software system. In it there are daily logs of users' calls. The logs are horizontally partitioned by date (month). Each partition is stored in a separate database and may be spread over multiple instances.

In the UI the user will specify a date range. The data returned can be sorted on any field. The date range may span over multiple partitions. The application must support paging through the date range's data.

I cannot load too many records into memory for sorting. Putting sort inside the query only gives me sorted data inside one result-set.

So I need to sort data from multiple partitions which are each individually sorted. How can I return sorted records to the UI from multiple sorted result-sets?

EDIT: After more analysis on this problem, We have some more inputs. There is requirement of pagination also. Due to this we need to find out one more way to do realtime sorting on multiple resultsets.

like image 450
Gaurava Agarwal Avatar asked Jun 30 '16 08:06

Gaurava Agarwal


People also ask

What is horizontal partitioning in database?

Horizontal partitioning (often called sharding). In this strategy, each partition is a separate data store, but all partitions have the same schema. Each partition is known as a shard and holds a specific subset of the data, such as all the orders for a specific set of customers.

Is sharding the same as horizontal partitioning?

Sharding and partitioning are both about breaking up a large data set into smaller subsets. The difference is that sharding implies the data is spread across multiple computers while partitioning does not. Partitioning is about grouping subsets of data within a single database instance.

Can horizontal and vertical partitions be combined?

Vertical and horizontal partitioning can be mixed. One may choose to keep all closed orders in a single table and open ones in a separate table i.e. two horizontal partitions.


1 Answers

By relying on ResultSet's ability to load limited data in memory we are able to come up with a solution in Java using Dynamic Comparator. Solution is to take first record from each resultSet and sort it in java and return first element from sorted data.

Detailed Solution:

First we built a program which can give us a dymanic Comparator based on the criteria choosed on the screen.

Second We have written one AggregateResultSet wrapper over the DAO which is wrapping over ResultSets from different partitions. Note: these individual ResultSets are already sorted with same criteria. Then AggregateResultSet will be given a dynamic comparator.

This AggregateResultSet will have a data structure to store first element of each result set initially. It will return the next element on call to next(). This element would be the element which comes first as per dynamicComparator. During next() call, We remove this element from temporary data structure and insert the next element from the same result set in the temporary data structure. This way AggregateResultSet will return data in expected order, by merging/storing/sorting very limited data in Java.

We hope to receive no comparison issues as we have mostly numeric/string data in sorting.

like image 53
Gaurava Agarwal Avatar answered Oct 16 '22 18:10

Gaurava Agarwal