Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Performatic structure without data duplication

Tags:

java

Say I have the following classes:

public class Tagged {

    private List<String> tags;
}

public class ContainerOfTagged {

    private List<Tagged> tagged;
}

With this structure, whenever I need to find a Tagged with a specific tag, I need to iterate over all the tagged in ContainerOfTagged, and iterating over all tags of each Tagged. That could affect performance depending on the size of the lists.

A simple solution would be changing the ContainerOfTagged class to use a Map, mapping tags in lists of Tagged:

public class ContainerOfTagged {

    private Map<String, List<Tagged>> tagMapping;
}

Now all I need to do is provide a tag, and the Map will return all Tagged with said tag. However, by doing this I'm causing data duplication, since the same tags exist in both the Tagged and ContainerOfTagged classes.

So, is there a way to solve this problem with a performatic solution that doesn't duplicate data?

like image 392
Flumen Avatar asked Oct 29 '22 21:10

Flumen


1 Answers

You can't really avoid "duplicating" the tags, but remember that you are not really duplicating them as the Lists and Maps only store references to the tag string, not the values (however, the references are likely to take up quite a lot of space in themselves).

The problem is that you need two indexes:

  1. You need to find the list of tags, given the Tagged object.
  2. You need to find the Tagged object, given a tag.

Ideally, your solution would look like this.You can solve your concerns about things getting out-of-sync by having a single method to manage tagging.

Note that in Tagged you should use a Set instead of a list to avoid duplication of tags.

public class Tagged {
    Set<String> tags;
}

public class TagContainer {
    Map<String, Tagged> tagIndex;

    public tag(String tag, Tagged tagged) {
        tagged.tags.add(tag);
        tagIndex.put(tag, tagged);
    }

If memory utilisation is a major concern you could try some kind of reference compression. Using this technique, you could store your tags in a array and then refer to them by index. If you had few enough, you could use a byte or short instead of a reference, but the code would be a lot messier and I would not recommend it.

EDIT:

In my first post, I proposed that Tagged should be an interface called Tagable. This is cleaner, but lengthens the solution, so I reverted to a class. Howevever, you could perhaps consider having a Tagable interface and implement this in the Tagged class.

public interface Tagable {
    Set<String> getTags;
    tag(String tag);
}
like image 124
rghome Avatar answered Nov 17 '22 23:11

rghome