Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Haskell data structure with optimized access methods

Tags:

haskell

I want to create data types for accessing and modifying a simple music library, consisting of albums consisting of tracks. For a basic idea, consider the following:

data MusicCollection = MC { albums :: Seq Album }
data Album = Album { albumTitle :: String, tracks :: Seq Track }
data Track = Track { trackTitle :: String, tempo  :: Tempo }

data Tempo = Unknown | BPM Int

In addition to tempo, there may be other attributes like style or rating.

The above solution gives me fast access to random albums. Additionally, I'd like to have fast access to all tracks faster than a specified tempo. And again, it would be nice to have fast random access to the returned tracks:

fasterThan :: Int -> MusicCollection -> SomeRandomAccessCollection Track

Updating a track in the collection shouldn't be too slow as well.

Question:

I don't know if adding a Map Tempo (Seq Track) to the MusicCollection is best or if it's possible to immitate relational data bases somehow. Perhaps there are completely different solutions?

I currently don't want to use a data base, but it would be interesting to know criteria when to use them in desktop applications.

like image 235
Duschvorhang Avatar asked Dec 20 '11 19:12

Duschvorhang


3 Answers

If you don't want to use databases, you pretty much have to make a Map for every relation that you want to have fast access to, as you described yourself.

Note that Data.Map.Map in the containers package uses ordered keys, so you can for example use Data.Map.split to get reasonably fast partitions of a map, which is useful to find "tracks faster than a specified tempo" (provided that you derive Ord for Tempo). Consequently, if you don't need ordered keys for a relation, you should use Data.HashMap.HashMap from unordered-containers instead, whose implementation is faster.

It is also possible to use in-memory relational databases with no additional dependencies. You can for example use persistent and Sqlite as the backend. How to do this is described in the persistent tutorial.

like image 186
dflemstr Avatar answered Nov 15 '22 06:11

dflemstr


Yes, Map Tempo (Seq Track) is a good choice here, especially since its Ord-based structure lets you make queries like "all tracks with tempo greater than n" efficiently; cf. Data.Map.split.

This basically is imitating relational databases by using an index.

You may also be interested in IxSet and HiggsSet (an intended successor to IxSet by a different author), which are intended to augment a set structure with various indexes, especially in tandem with a transactional serialisation system like acid-state. They support convenient comparison queries like greater than, less than, etc. If you only have one or two indices, though, it's probably best just to roll your own.

like image 43
ehird Avatar answered Nov 15 '22 06:11

ehird


It's good that you mention databases, because relational databases are designed precisely to solve problems like these. The classic example from database theory is the Invoice/Items problem, which your problem here mirrors almost exactly: an invoice has individual items, yet sometimes we want to write reports that consider items without connecting them to their invoices. A database or data structure that forces us to go through the invoices to get at the items is making us do too much work in these cases.

So one of your alternatives here would just be to use a relational database (there are lightweight embedded ones like SQLite that might be the most appropriate). Another alternative is to make a more relational-like design, and model the problem as a handful of relations that can be joined with each other in multiple ways—possibly assisted by an index that speeds up a join.

So you might start with something like this, where the tracks are available at the top level of the collection (and thus it's easy to just search through the tracks):

data MusicCollection = MC { tracks :: Set Track
                          , albums :: Set Album
                          , albumToTracks :: Album -> [Track] }
data Track = Track { album :: Album, trackTitle :: String, tempo  :: Maybe Int }
                 deriving (Eq, Ord)
data Album = Album { albumTitle :: String }
                 deriving (Eq, Ord)

When you constructed the MusicCollection value, you'd have to make sure that you construct the albumToTracks function that allows you to quickly get an album's tracks. (Or you could use a Map Album [Track] instead of a function; the function is a bit more general.)

Another alternative I haven't worked out, but that may very well be worth looking into: a circular data structure where the tracks and the albums both refer to each other. You'd have data declarations like this:

data MusicCollection = MC { tracks :: Set Track, albums :: Set Album }
data Track = Track { album :: Album, trackTitle :: String, tempo  :: Maybe Int }
                 deriving (Eq, Ord)
data Album = Album { albumTitle :: String, tracks :: Set Track }
                 deriving (Eq, Ord)

The question here is, of course, whether you can "tie the knot" when you're constructing this data structure so that it can be finitely represented. If you can pull it off, well, this would be great...

like image 25
Luis Casillas Avatar answered Nov 15 '22 08:11

Luis Casillas