Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Data.Set : does it always know best?

Tags:

haskell

I need to represent a set and I'm starting to work with Data.Set. I see that there's nothing to do really - singleton, union, intersection, etc. are all just there. I like it. I can express "what", not "how". But my inner C programmer is uncomfortable. There are many ways to implement a set (binary tree, hash, boolean array, etc.) Can I really trust Data.Set to choose the best one? Can I guide it in some way, or do I just surrender to its (I admit, probably superior) judgement?

like image 336
gcbenison Avatar asked Mar 18 '12 03:03

gcbenison


People also ask

What is the purpose of a data set?

Data sets can hold information such as medical records or insurance records, to be used by a program running on the system. Data sets are also used to store information needed by applications or the operating system itself, such as source programs, macro libraries, or system variables or parameters.

Is more data always better in data science?

The first and perhaps most obvious way in which more data delivers better results in data science is the ability to expose more features to feed your data, science models. In this case, accessing and using more data assets can lead to “wider datasets” containing more variables.

What is most important when looking at a new data set?

Visualize the Data One of the most important parts of data analysis is data visualization, which refers to the process of creating graphical representations of data. Visualizing the data will help you to easily identify any trends or patterns and obvious outliers.


2 Answers

Data.Set has no inner intelligence (just see the source!). It is just a balanced tree or ordered elements. You can look around on hackage for many other set and set-like structures with different performance characteristics. For example, see unordered-containers (HashSet), HashTables and bloomfilter.

like image 55
Thomas M. DuBuisson Avatar answered Oct 11 '22 12:10

Thomas M. DuBuisson


The general Data.Set uses a balanced binary tree. If you have sets of integers or bit vectors, you'll want Data.IntSet, which uses Patricia tries.

Both implementations have been honed through years of competition to get the best performance possible with Haskell.

Surrender Dorothy!

like image 35
Norman Ramsey Avatar answered Oct 11 '22 12:10

Norman Ramsey