Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What type of collection should I use?

I have approximately 10,000 records. Each records has 2 fields: one field is a string up to 300 characters in length and the other field is a decimal value. This is like a product catalog with product names and the price of each product.

What I need to do is allow the user to type any word and display all products containing that word together with their prices in a listbox. That's all.

  1. What type of collection is best for this scenario?
  2. If I need to sort based on either product name or price, will the choice still be the same?

Right now I am using an XML file, but I thought using a collection so that I can embed all the values in the code is simpler. Thanks for your suggestions.

like image 391
user763554 Avatar asked Dec 24 '11 08:12

user763554


People also ask

Which Java collection should I use?

The best general purpose or 'primary' implementations are likely ArrayList , LinkedHashMap , and LinkedHashSet . Their overall performance is better, and you should use them unless you need a special feature provided by another implementation. That special feature is usually ordering or sorting.

What are the three types of collections?

We can use them to create different types of collections in the Java program. Some important collection classes are ArrayList, LinkedList, HashMap, TreeMap, HashSet, and TreeSet.

Which collection is best for insertion and deletion?

If you only insert and delete at the end of the list, then ArrayList is the way to go.

Which collection is best for searching?

HashSet and LinkedHashSet have the best performance on search operations. The idea of the test is simple. First, an array of objects of my own class, MySimpleDate, is created. Then, an object of the to-be-tested collection class is created, and filled with all the objects from the array.


2 Answers

A Dictionary will do the job. However, if you are doing rapid partial matches (e.g. search as the user types) you may get better performance by creating multiple keys which point to the same item. For example, the word "Apple" could be located with "Ap", "App", "Appl", and "Apple".

I have used this approach on a similar number of records with very good results. I have turned my 10K source items into about 50K unique keys. Each of these Dictionary entries points to a list containing references to all matches for that term. You can then search this much smaller list more efficiently. Despite the large number of lists this creates, the memory footprint is quite reasonable.

You can also make up your own keys if desired to redirect common misspellings or point to related items. This also eliminates most of the issues with unique keys because each key points to a list. A single item may be classified by each of the words in its name; this is extremely useful if you have long product names with multiple words in it. When classifying your items, each word in the name can be mapped to one or more keys.

I should also point out that building and classifying 10K items shouldn't take long if done correctly (couple hundred milliseconds is reasonable). The results can be cached for as long as you want using Application, Cache, or static members.

To summarize, the resulting structure is a Dictionary<string, List<T>> where the string is a short (2-6 characters works well) but unique key. Each key points to a List<T> (or other collection, if you are so inclined) of items which match that key. When a search is performed, you locate the key which matches the term provided by the user. Depending on the length of your keys, you may truncate the user's search to your maximum key length. After locating the correct child collection, you then search that collection for a complete or partial match using whatever methodology you wish.

Lastly, you may wish to create a lightweight structure for each item in the list so that you can store additional information about the item. For example, you might create a small Product class which stores the name, price, department, and popularity of the product. This can help you refine the results you show to the user.

All-in-all, you can perform intelligent, detailed, fuzzy searches in real-time.

The aforementioned structures should provide functionality roughly equivalent to a trie.

like image 101
Tim M. Avatar answered Oct 15 '22 20:10

Tim M.


10K records is not that much.

An Dictionary<string,decimal> would fit the bill. You can sort by key or by value using LINQ, as well as do searches.

This assumes that product names are unique.

like image 22
Oded Avatar answered Oct 15 '22 19:10

Oded