So I need to have a some sort of multiset of characters, where adding a duplicate character increases the cardinality by 1, and the multiplicity of characters should not drastically increase the memory that the object takes up.
This will be implemented with some sort of map where characters are keys, that hold a value representing the number of that character is represented in the set.
However, I'm struggling to figure out which collection would be best for this (I was looking at hashmap) and how to declare this data type. I was doing something like this
Map m = new HashMap(char, int);
But the above is an incorrect declaration, and I'm not sure how to exactly approach this.
Declare a Hashmap in Java of {char, int}. Traverse in the string, check if the Hashmap already contains the traversed character or not. If it is present, then increase its count using get() and put() function in Hashmap. Once the traversal is completed, traverse in the Hashmap and print the character and its frequency.
The HashMap gives you an unsorted, unordered Map. When you need a Map and you don't care about the order (when you iterate through it), then HashMap is the right choice. Keys of HashMap is like Set means no duplicates allowed and unordered while values can be any object even null or duplicate is also allowed.
unordered_map will do your purpose as well as hashmap in java will do . unordered map will be faster on operations like insertion and deletion than map in c++ but map possesses some extra useful characteristics such as for the elements are in sorted order , we can traverse from start to finish .
HashMap<K, V> is a part of Java's collection since Java 1.2. This class is found in java. util package. It provides the basic implementation of the Map interface of Java. It stores the data in (Key, Value) pairs, and you can access them by an index of another type (e.g. an Integer).
Try this declaration:
Map<Character, Integer> m = new HashMap<Character, Integer>();
You can then add characters as such:
char c = //...;
if (m.containsKey(c))
m.put(c, m.get(c) + 1);
else
m.put(c, 1);
I would implement it using int[]
(for ASCII) or int[][]
for unicode. Given the memory footprint of storing boxed integers in a hash map with all the hashing conflicts introduced by using numbers, and characters are nothing but numbers, as keys.
public class CharacterBag {
private int[][] data = new int[255][];
public void add(char ch) {
int[] bin = data[ch >> 8];
if (bin == null) bin = data[ch >> 8] = new int[255];
bin[ch & 0xFF]++;
}
public int frequency(char ch) {
int[] bin = data[ch >> 8];
if (bin == null) return 0;
return bin[ch & 0xFF];
}
}
This starts with a memory footprint of zero, and adds 2K for each unicode page. Typically text uses characters from one or two unicode pages.
In comparison using HashMap will have the whole overhead of storing boxed primitives in a list of linked lists. For each character there will an object of the Entry
class with two pointers pointing to a boxed key and a linked list object with all its fields and which holds an object of class Node
with a forward and a backward pointer and which has a pointer to the boxed integer … shall I go on? ;)
Map<Character, Integer> charCount = new HashMap<Character, Integer>();
public void addCharacter(char c) {
Integer value = charCount.get(c);
if (value == null) {
charCount.put(c, 1);
} else {
charCount.put(c, value + 1);
}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With