Algorithms for compression of set tries

Tags:

I have a collection of sets that I'd like to place in a trie.

Normal tries are made of strings of elements - that is, the order of the elements is important. Sets lack a defined order, so there's the possibility of greater compression.

For example, given the strings "abc", "bc", and "c", I'd create the trie:

(*,3) -> ('a',1) -> ('b',1) -> ('c',1)
      -> ('b',1) -> ('c',1)
      -> ('c',1)

But given the sets { 'a', 'b', 'c' }, { 'b', 'c' }, { 'c' }, I could create the above trie, or any of these eleven:

(*,3) -> ('a',1) -> ('b',1) -> ('c',1)
      -> ('c',2) -> ('a',1)

(*,3) -> ('a',1) -> ('c',1) -> ('b',1)
      -> ('b',1) -> ('c',1)
      -> ('c',1)

(*,3) -> ('a',1) -> ('c',1) -> ('b',1)
      -> ('c',2) -> ('a',1)

(*,3) -> ('b',2) -> ('a',1) -> ('c',1)
                 -> ('c',1)
      -> ('c',1)

(*,3) -> ('b',1) -> ('a',1) -> ('c',1)
      -> ('c',2) -> ('b',1)

(*,3) -> ('b',2) -> ('c',2) -> ('a',1)
      -> ('c',1)

(*,3) -> ('b',1) -> ('c',1) -> ('a',1)
      -> ('c',2) -> ('b',1)

(*,3) -> ('c',2) -> ('a',1) -> ('b',1)
      -> ('b',1) -> ('c',1)

(*,3) -> ('c',2) -> ('a',1) -> ('b',1)
                 -> ('b',1)

(*,3) -> ('c',2) -> ('b',1) -> ('a',1)
      -> ('b',1) -> ('c',1)

(*,3) -> ('c',3) -> ('b',2) -> ('a',1)

So there's obviously room for compression (7 nodes to 4).

I suspect defining a local order at each node dependent on the relative frequency of its children would do it, but I'm not certain, and it might be overly expensive.

So before I hit the whiteboard, and start cracking away at my own compression algorithm, is there an existing one? How expensive is it? Is it a bulk process, or can it be done per-insert/delete?

234

asked Feb 22 '12 23:02

rampion

1 Answers

I think you should sort a set according to item frequency and this get a good heuristics as you suspect. The same approach using in FP-growth (frequent patterns mining) for representing in compact way the items sets.

199

answered Sep 23 '22 02:09

Alexander Kuznetsov

Related questions
                            
                                Balanced spanning tree (T) from undirected graph
                            
                                CRC divisor calculation
                            
                                Determining Resting contact between sphere and plane when using external forces
                            
                                Are there problems that can't be solved efficiently without arrays? [duplicate]
                            
                                Impossible-to-find bug in a program that equalizes wealth in a group (UVA 10137, "The Trip")
                            
                                Are there any "special" image compression algorithms for face cases?
                            
                                How is Google Calculator implemented?
                            
                                Finding minimum cut-sets between bounded subgraphs
                            
                                Algorithm for finding symmetries of a tree
                            
                                Splitting Coordinates into 3 Subspaces To Resolve Unboundedness
                            
                                Exposé Layout Algorithm
                            
                                Is there a tree structure or algorithm to shuffle around levels in a tree?
                            
                                Identifying common route segments from GPS tracks

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Algorithms for compression of set tries

Tags:

language-agnostic

algorithm

compression

trie

rampion

People also ask

1 Answers

Alexander Kuznetsov

Recent Activity

Donate For Us