I need to represent a set and I'm starting to work with Data.Set. I see that there's nothing to do really - singleton
, union
, intersection
, etc. are all just there. I like it. I can express "what", not "how". But my inner C programmer is uncomfortable. There are many ways to implement a set (binary tree, hash, boolean array, etc.) Can I really trust Data.Set to choose the best one? Can I guide it in some way, or do I just surrender to its (I admit, probably superior) judgement?
Data sets can hold information such as medical records or insurance records, to be used by a program running on the system. Data sets are also used to store information needed by applications or the operating system itself, such as source programs, macro libraries, or system variables or parameters.
The first and perhaps most obvious way in which more data delivers better results in data science is the ability to expose more features to feed your data, science models. In this case, accessing and using more data assets can lead to “wider datasets” containing more variables.
Visualize the Data One of the most important parts of data analysis is data visualization, which refers to the process of creating graphical representations of data. Visualizing the data will help you to easily identify any trends or patterns and obvious outliers.
Data.Set
has no inner intelligence (just see the source!). It is just a balanced tree or ordered elements. You can look around on hackage for many other set and set-like structures with different performance characteristics. For example, see unordered-containers (HashSet), HashTables and bloomfilter.
The general Data.Set
uses a balanced binary tree. If you have sets of integers or bit vectors, you'll want Data.IntSet
, which uses Patricia tries.
Both implementations have been honed through years of competition to get the best performance possible with Haskell.
Surrender Dorothy!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With