I'm struggling to find the best way to merge arrays (or create new ones) by looking at their shared value.
List<String[]> dictionary = new ArrayList<String[]>();
this is my "dictionary" filled with arrays of 2 words, for example it contains arrays:
["A","B"]
["B","C"]
["D","E"]
["F","C"]
["G","H"]
["T","D"]
I need to merge them by values they share, so for example the finished "dictionary" (or completely new list) would look like this:
["A","B","C","F"];
["D","E","T"];
["G","H"];
Also, the old arrays don't have to be removed they can stay in "dictionary" but I need the merged ones and I have hard time figuring it out.
Arrays don't have to be sorted at anyhow.
This is what i have so far and it is not working
public static void SynonymsMerge(List<String[]> dictionary){
ArrayList<ArrayList<String>> newDictionary = new ArrayList<ArrayList<String>>();
for(int i=0;i < dictionary.size(); i++){
ArrayList<String> synonyms = new ArrayList<String>();
for(int j=0; j < dictionary.get(i).length; j++){
synonyms.add(dictionary.get(i)[j]);
}
newDictionary.add(synonyms);
}
for(int i=0;i< newDictionary.size();i++){
for(int j=0; j < newDictionary.size();j++){
for (int k=0; k < newDictionary.get(j).size() ;k++) {
if (newDictionary.get(i).equals(newDictionary.get(j)))
continue;
if (newDictionary.get(i).contains(newDictionary.get(j).get(k)))
newDictionary.get(i).addAll(newDictionary.get(j));
First of all, here is the code. I changed the input type from List<String[]> to List<List<String>> as it does not really make sense to mix up both Lists and Arrays. This also applies to the output type.
public static List<List<String>> merge(List<List<String>> dictionary) {
List<List<String>> newDictionary = new ArrayList<>();
for (List<String> stringPair : dictionary) {
List<Integer> matchIndices = new ArrayList<>();
for (int i = 0; i < newDictionary.size(); i++) {
List<String> newStrings = newDictionary.get(i);
for (String str : stringPair) {
if (newStrings.contains(str)) {
matchIndices.add(i);
}
}
}
if (matchIndices.size() == 0) {
newDictionary.addAll(new ArrayList<List<String>>(Collections.singleton(new ArrayList<>(stringPair))));
continue;
}
matchIndices.sort(Integer::compareTo);
if (matchIndices.size() == 1) {
newDictionary.get(matchIndices.get(0)).addAll(new ArrayList<>(stringPair));
} else {
int last = matchIndices.remove(0);
while (matchIndices.size() > 0) {
int i = matchIndices.get(0);
newDictionary.get(last).addAll(newDictionary.get(i));
newDictionary.remove(i);
matchIndices.remove(0);
matchIndices = new ArrayList<>(matchIndices.stream().map(a -> a - 1).toList());
}
}
}
newDictionary = newDictionary.stream()
.map(strings -> strings.stream().distinct().toList())
.toList();
return newDictionary;
}
dictionary the input of type List<List<String>> (inner List has max size of 2, even though the function would work with even more strings in theory)newDictionary the output of the function of type List<List<String>>The following code is executed for every input pair/List of strings in directory
newDictionary in which the strings from the par are already present. This List of indices is called matchIndicesstringPair=["A","E"] newDictionary:[["I", "A", "O"], ["P", "D"]] would result in matchIndices=[0] because only "A" is present one time in the first element of newDictionarymatchIndices.size() is 0, create a new group in newDictionary with the string pair. Back to 1.matchIndices.size() is 1, append the strings from the pair to the specific newDictionary group with the index specified in matchIndices. Back to 1.matchIndices.size() is greater than 1, that means that multiple groups from newDictionary with the indices specified in matchIndices will have to be merged together in the for-loop. Back to 1.In the end we have to make sure there are no duplicates in the Lists in newDictionary.
public static void main(String[] args) {
List<List<String>> dictionary = new ArrayList<>(List.of(
List.of("A", "B"),
List.of("B", "C"),
List.of("D", "E"),
List.of("F", "C"),
List.of("G", "H"),
List.of("T", "D")));
System.out.println(merge(dictionary));
}
In your specific example we don't have to merge multiple groups.
But with input data like this
List<List<String>> dictionary = new ArrayList<>(List.of(
List.of("A", "B"),
List.of("B", "C"),
List.of("D", "E"),
List.of("F", "E"),
List.of("E", "A")));
we eventually come to the point where newDictionary=[[A, B, B, C], [D, E, F, E]] and we have to try to insert [E, A]. Here both groups from newDictionary will have to be merged together.
This then results in the output of [[A, B, C, D, E, F]], where both groups are merged and removed duplicates.
I am not really happy with this solution as it is not really clear what is actually going on, but I'm still posting this, because you said you would be happy with any solution. :)
For this problem you need to connect A with B (and vice versa), then C with B and C with F and so on. A lot of intersections and a lot of cycles.
And that is the perfect case to utilize a Graph as data structure.
To be precise, this dataset could be represented as a cyclic undirected disjointed graph, that contains several connected components. And in terms of graph theory, this task boils down to discovering all components in the graph.
For that, we need to take the following steps:
Implementation:
public class Graph {
private Map<String, Vertex> vertexByName = new HashMap<>();
private Graph() {} // no way and no need to invoke this constractor outside the class
public static Graph getInstance(List<List<String>> dictionary) { // method responsible for instantiation and initialization of the graph
Graph graph = new Graph();
graph.init(dictionary);
return graph;
}
private void init(List<List<String>> dictionary) {
for (List<String> list: dictionary) {
for (String name: list) {
addVertex(name, list);
}
}
}
private void addVertex(String name, List<String> neighbours) {
Vertex cur = vertexByName.computeIfAbsent(name, Vertex::new);
for (String neighbourName: neighbours) {
if (neighbourName.equals(name)) continue;
Vertex neighbour = vertexByName.computeIfAbsent(neighbourName, Vertex::new);
cur.addNeighbour(neighbour);
neighbour.addNeighbour(cur); // this graph is undirectional, i.e. both vertices in each connected pair should hold a reference to one another
}
}
public List<List<String>> getComponents() {
List<List<String>> components = new ArrayList<>();
Set<Vertex> seen = new HashSet<>();
for (Vertex vertex: vertexByName.values()) {
if (seen.contains(vertex)) continue;
components.add(getComponentNames(vertex, seen));
}
return components;
}
// Depth first search implementation
private List<String> getComponentNames(Vertex vertex, Set<Vertex> seen) {
Deque<Vertex> stack = new ArrayDeque<>();
List<String> names = new ArrayList<>();
stack.push(vertex);
seen.add(vertex);
while(!stack.isEmpty()) {
Vertex current = stack.pop();
names.add(current.getName());
for (Vertex neighbour: current.getNeighbours()) {
if (seen.contains(neighbour)) continue;
seen.add(neighbour);
stack.push(neighbour);
}
}
return names;
}
private class Vertex {
private String name;
private Set<Vertex> neighbours = new HashSet<>();
public Vertex(String name) {
this.name = name;
}
public Vertex(String name) {
this.name = name;
}
public boolean addNeighbour(Vertex neighbour) {
return neighbours.add(neighbour);
}
public String getName() {
return name;
}
public Set<Vertex> getNeighbours() {
return neighbours;
}
}
}
main() - demo
public static void main(String[] args) {
List<List<String>> dictionary =
List.of(List.of("A","B"), List.of("B","C"),
List.of("D","E"), List.of("F","C"),
List.of("G","H"), List.of("T","D"));
Graph graph = Graph.getInstance(dictionary);
List<List<String>> componentNames = graph.getComponents();
System.out.println(componentNames);
}
Output
[[A, B, C, F], [T, D, E], [G, H]]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With