Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

azure service fabric reliable dictionary linq query very slow

I have a reliable dictionary in service fabric stateful service. I have a simple linq expression.
I am using Ix-Async package for building an asyncenumerable.


using (ITransaction tx = this.StateManager.CreateTransaction())  
        {  

          var result = (await customers.CreateLinqAsyncEnumerable(tx))
                .Where(x => x.Value.NameFirst != null && x.Value.NameFirst.EndsWith(n, StringComparison.InvariantCultureIgnoreCase))
                    .Select(y => y.Value);

           return await result.ToList();


        }  

The data is organized into 2 partitions with around 75,000 records in each partition. I am using Int64 range as the partition key. In the above code, the "Result.ToList()" takes around 1 minute to execute for each partition. Another weired thing is, the actual result is empty!. The same sql run in sql server returns rows with customer first names ending with "c". But, this is besides the point. My biggest concern is performance of "ReliableDictionary" linq query.
Regards

like image 872
teeboy Avatar asked Mar 10 '23 17:03

teeboy


1 Answers

Reliable Dictionary periodically removes least recently used values from memory. This is to enable

  • Large Reliable Dictionaries
  • Higher Density: Higher density of Reliable Collections per replica and higher density of replicas per node.

The trade-off is that, this can increase read latencies: disk IO is required to retrieve values that are not cached in-memory.

There are couple of options to get lower latency on enumerations.

1) Key Filtered Enumeration: You can move the fields that you would like to use in your query in to the TKey of the ReliableDictionary (NameFirst in the above example). This would allow you use the CreateEnumerbleAsync overload that takes in a key filter. The key filter allows Reliable Dictionary to avoid retrieving values from the disk for keys that do not match your query. One limitation of this approach is that TKey (hence the fields inside it) cannot be updated.

2) In-memory Secondary Index using Notifications: Reliable Dictionary Notifications can be used to build any number of secondary indices. You could build a secondary index that keeps all of the values in-memory hence trading memory resources to provide lower read latency. Furthermore, since you have full control over the secondary index, you can keep the secondary index ordered (e.g. by reverse of NameFirst in your example).

We are also considering making Reliable Dictionary's in-memory TValue sweep policy configurable. With this, you will be able to configure the Reliable Dictionary to keep all values in-memory if read latencies is a priority for you.

Since in your scenario most of the time in enumeration is spent on disk IO, you can also benefit from using your Custom Serializer which can reduce the disk and network footprint.

Thank you for your question.

like image 147
Mert Coskun - MSFT Avatar answered Mar 13 '23 14:03

Mert Coskun - MSFT