I am currently searching for a elegant and/but efficient solution to a problem that i guess is quite common. Consider the following situation:
I defined a fileformat based on a BTree that is defined (in a simplified way) like this:
data FileTree = FileNode [Key] [FileOffset]
| FileLeaf [Key] [Data]
Reading and writing this from a file to a lazy data structure is implemented and works just fine. This will result in a instance of:
data MemTree = MemNode [Key] [MemTree]
| MemLeaf [Key] [Data]
Now my goal is to have a generic function updateFile :: FilePath -> (MemTree -> MemTree) -> IO ()
that will read in the FileTree
and convert it into a MemTree, apply the MemTree -> MemTree
function and write back the changes to the tree structure. The problem is that the FileOffsets have to be conserved somehow.
I have two approaches to this problem. Both of them lack in elegance and/or efficiency:
This approach extends the MemTree to contain the offsets:
data MemTree = MemNode [Key] [(MemTree, Maybe FileOffset)]
| MemNode [Key] [Data]
The read function would then read in the FileTree
and stores the FileOffset
alongside the MemTree
reference. Writing will checks if a reference already has an associated offset and if it does it just uses it.
Pros: easy to implement, no overhead to find the offset
Cons: exposes internal to the user who is responsible to set the offset to Nothing
Another way to attack this problem is to read in the FileTree
and create a StableName.Map
that holds onto the FileOffsets
. That way (and if i understand the semantics of StableName
correctly) it should be possible to take the final MemTree
and lookup the StableName
of each node in the the StableName.Map
. If there is an entry the node is clean and doesn't have to be written again.
Pros: doesn't expose the internals to the user
Cons: involves overhead for lookups in the map
These are the two approaches i can think of. The first one should be more efficient, the second one is more pleasant to the eye. I'd like your comments on my ideas, maybe someone even has a better approach in mind?
There are two reasons i am searching for a solution like this:
On the one hand you should try to handle errors before they arise by using the type system. The aforementioned user is of course the designer of the next layer in the system (ie me). By working on the pure tree representation some kinds of bugs won't be able to happen. All changes to the tree in the file should be in one place. That should make reasoning easier.
On the other hand i could just implement something like insert :: FilePath -> Key -> Value -> IO ()
and be done with it. But then i'll lose a very nice trait that comes free when i keep a (kind of a) log by updating the tree in place. Transactions (ie merging of several inserts) are just a matter of working on the same tree in memory and writing just the differences back to the file.
Basically, data structures are divided into two categories: Linear data structure. Non-linear data structure.
Complex types are nested data structures composed of primitive data types. These data structures can also be composed of other complex types. Some examples of complex types include struct(row), array/list, map and union. Complex types are supported by most programming languages including Python, C++ and Java.
Data Structure can be defined as the group of data elements which provides an efficient way of storing and organising data in the computer so that it can be used efficiently. Some examples of Data Structures are arrays, Linked List, Stack, Queue, etc.
I think that the package Data.Generic.Diff may do exactly what you wanted. It references somebody's thesis for the idea of how it works.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With