Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Large static arrays are slowing down class load, need a better/faster lookup method

Tags:

c#

unicode

I have a class with a couple static arrays:

an int[] with 17,720 elements
a string[] with 17,720 elements

I noticed when I first access this class it takes almost 2 seconds to initialize, which causes a pause in the GUI that's accessing it.

Specifically, it's a lookup for Unicode character names. The first array is an index into the second array.

static readonly int[] NAME_INDEX = {
0x0000, 0x0001, 0x0005, 0x002C, 0x003B, ...

static readonly string[] NAMES = {
"Exclamation Mark", "Digit Three", "Semicolon", "Question Mark", ...

The following code is how the arrays are used (given a character code). [Note: This code isn't a performance problem]

int nameIndex = Array.BinarySearch<int>(NAME_INDEX, code);
if (nameIndex > 0) { return NAMES[nameIndex]; }

I guess I'm looking at other options on how to structure the data so that 1) The class is quickly loaded, and 2) I can quickly get the "name" for a given character code.

Should I not be storing all these thousands of elements in static arrays?

Update
Thanks for all the suggestions. I've tested out a Dictionary approach and the performance of adding all the entries seems to be really poor.

Here is some code with the Unicode data to test out Arrays vs Dictionaries http://drop.io/fontspace/asset/fontspace-unicodesupport-zip

Solution Update
I tested out my original dual arrays (which are faster than both dictionary options) with a background thread to initialize and that helped performance a bit.

However, the real surprise is how well the binary files in resource streams works. It is the fastest solution discussed in this thread. Thanks everyone for your answers!

like image 700
Visualize Avatar asked Dec 03 '22 05:12

Visualize


1 Answers

So a couple of observations. Binary Search is only going to work if your array is sorted, and from your above code snippet, it doesn't look to be sorted.

Since your primary goal is to find a specific name, your code is begging for a hash table. I would suggest using a Dictionary, it will give you O(1) (on average) lookup, without much more overhead than just having the arrays.

As for the load time, I agree with Andrey that the best way is going to be by using a separate thread. You are going to have some initialization overhead when using the amount of data you are using. Normal practice with GUIs is to use a separate thread for these activites so you don't lock up the UI.

like image 50
cmptrer Avatar answered Jan 09 '23 04:01

cmptrer