Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

StackOverflowError while running Depth First Search (undirected graph)

Tags:

java

graph

I have a DFS visit recursive method that sometimes throws a StackOverflowError. Since the size of the graph is large (around 20000 vertices), recursive calls are many, and so I tried to run with -Xss10M and everything works. I'd just like to understand why adding at the beginning of the method a System.out.println, even without -Xss10M, the method doesn't throw any StackOverflowError. How is it possible?

This is the DFS visit method:

private int dfsVisit(Vertex<T> v, int time){
  // System.out.println("Hello");
  Vertex<T> n;
  time++;
  v.d = time;
  v.color = Vertex.Color.GRAY;
  for (Map.Entry<Vertex<T>, Float> a : v.neighbours.entrySet()){
     n = a.getKey();
     if(n.color == Vertex.Color.WHITE){
        n.previous = v;
        time = dfsVisit(n, time);
     }
  }
  v.color = Vertex.Color.BLACK;
  time++;
  v.f = time;

  return time;
}

This is the complete code

import java.io.*;
import java.util.*;

class Graph<T> {
   private final Map<T, Vertex<T>> graph; 

   public static class Edge<T>{
      public final T v1, v2;
      public final float dist;

      public Edge(T v1, T v2, float dist) {
         this.v1 = v1;
         this.v2 = v2;
         this.dist = dist;
      }
   }

   public static class Vertex<T> implements Comparable<Vertex>{ // SPOSTARE VAR IST NEL COSTRUTTORE
      public enum Color {WHITE, GRAY, BLACK, UNKNOWN};
      public final T name;
      public float dist;
      public Vertex<T> previous;
      public final Map<Vertex<T>, Float> neighbours;
      public Color color;
      public int d, f;

      public Vertex(T name) {
         this.name = name;
         dist = Float.MAX_VALUE; 
         previous = null;
         neighbours = new HashMap<Vertex<T>, Float>(); // adjacency list
         color = Color.UNKNOWN;
         d = 0;
         f = 0;
      }

      private void printPath() {
         if (this == this.previous) { 
            System.out.print(this.name);
         } else if (this.previous == null) {
            System.out.print(this.name + " unreached");
         } else {
            this.previous.printPath();
            System.out.print(" -> " + this.name + "(" + this.dist + ")");
         }
      }

      public int compareTo(Vertex other){
         if(this.dist == other.dist)
            return 0;
         else if(this.dist > other.dist)
            return 1;
         else
            return -1;
      }

   }

   // Builds a graph from an array of edges 
   public Graph(ArrayList<Graph.Edge> edges) {
      graph = new HashMap<>(edges.size());

      // add vertices
      for (Edge<T> e : edges) {
         if (!graph.containsKey(e.v1)) graph.put(e.v1, new Vertex<>(e.v1));
         if (!graph.containsKey(e.v2)) graph.put(e.v2, new Vertex<>(e.v2));
      }

      // create adjacency list
      for (Edge<T> e : edges) {
         graph.get(e.v1).neighbours.put(graph.get(e.v2), e.dist);
         graph.get(e.v2).neighbours.put(graph.get(e.v1), e.dist); 
      }
   }


   public void dijkstra(T startName) {
      if (!graph.containsKey(startName)) {
         System.err.println("Graph doesn't contain start vertex " + startName);
         return;
      }
      final Vertex<T> source = graph.get(startName);
      NavigableSet<Vertex<T>> q = new TreeSet<>(); // priority queue

      // set-up vertices
      for (Vertex<T> v : graph.values()) {
         v.previous = v == source ? source : null;
         v.dist = v == source ? 0 : Float.MAX_VALUE;
         q.add(v);
      }

      dijkstra(q);
   }

   private void dijkstra(final NavigableSet<Vertex<T>> q) {      
      Vertex<T> u, v;
      while (!q.isEmpty()) {

         u = q.pollFirst();
         if (u.dist == Float.MAX_VALUE) break; //???????????


         for (Map.Entry<Vertex<T>, Float> a : u.neighbours.entrySet()) {
            v = a.getKey(); 

            final float alternateDist = u.dist + a.getValue();
            if (alternateDist < v.dist) { 
               q.remove(v);
               v.dist = alternateDist;
               v.previous = u;
               q.add(v);
            } 
         }
      }
   }


   public void printPath(T endName) {
      if (!graph.containsKey(endName)) {
         System.err.println("Graph doesn't contain end vertex " + "\"" + endName + "\"" );
         return;
      }

      graph.get(endName).printPath();
      System.out.println();
   }

   public void printAllPaths() {
      for (Vertex<T> v : graph.values()) {
         v.printPath();
         System.out.println();
      }
   }

   public Vertex<T> getVertex(T key){
      if(graph.containsKey(key))
         return graph.get(key);
      return null;
   }

   public void printAdjacencyList(){
      System.out.println("Adjacency list:");
      for(Vertex<T> v : graph.values()){
         System.out.print(v.name + ":\t");
         for (Map.Entry<Vertex<T>, Float> a : v.neighbours.entrySet()){
             System.out.print(a.getKey().name + "(" + a.getValue() + ") | ");
         }
         System.out.println();
      }

   }
   /*
   P.S. I know that if only used to calculate the connected components of the graph, dfs visit 
   could be written differently but I preferred to write it in a more general way, so that it 
   can be reused if necessary.
   */
   private int dfsVisit(Vertex<T> v, int time){
     // System.out.println("ciao");
      Vertex<T> n;
      time++;
      v.d = time;
      v.color = Vertex.Color.GRAY;
      for (Map.Entry<Vertex<T>, Float> a : v.neighbours.entrySet()){
         n = a.getKey();
         if(n.color == Vertex.Color.WHITE){
            n.previous = v;
            time = dfsVisit(n, time);
         }
      }
      v.color = Vertex.Color.BLACK;
      time++;
      v.f = time;

      return time;
   }

   /*
   Print the size of the connected components of the graph
   */
   public void connectedComponents(){
      for(Vertex<T> v : graph.values()){
         v.color = Vertex.Color.WHITE;
         v.previous = null;
      }

      for(Vertex<T> v : graph.values()){
         if(v.color == Vertex.Color.WHITE)
             System.out.println(dfsVisit(v, 0)/2);
      }

   }



}

here's the test class

import java.io.*;
import java.util.*;

public class Dijkstra {

   private static ArrayList<Graph.Edge> a = new ArrayList<Graph.Edge>();
   private static final String START = "torino";
   private static final String END = "catania";

   public static void main(String[] args) {
      String fileName = "italian_dist_graph.txt";
      try{
         Scanner inputStream = new Scanner(new File(fileName));
         String record;
         while(inputStream.hasNextLine()){
            record = inputStream.nextLine();
            String[] array = record.split(",");
            String from = array[0];
            String to = array[1];
            float dist = Float.parseFloat(array[2]);
            a.add(new Graph.Edge(from, to, dist));
         }
         inputStream.close();
      } catch(FileNotFoundException e){
         System.out.println("Impossibile trovare il file "+fileName);
      }



      Graph<String> g = new Graph<String>(a);

      g.dijkstra(START);
      g.printPath(END);
     //System.out.printf("%f\n", g.getVertex(END).dist/1000.0f);
      g.connectedComponents();
   }
}

N.B. try to comment g.dijkstra(START) and g.printPath(END); everything seems to work.

Here's the link to the data set

https://drive.google.com/open?id=0B7XZY8cd0L_fZVl1aERlRmhQN0k
like image 942
Cilla Avatar asked May 27 '16 18:05

Cilla


1 Answers

Some general recommendations:
Your code mixes up attributes of vertices, that are related to a single run of dfs and such that are direct attributes of the vertices. Bad bad bad style. This is quite likely to break any more complex algorithm, can produce unexpected behavior and would require clearing the states after each run, to ensure stability of the code. Instead keep states that are related to a single run of a algorithm only visible to that function. E.g. store the states inside a Map, use the decorator-pattern to create a datastructure that provides additional attributes and that has method-local scope, etc.. As an example: running your code twice on the same graph (same Object) with the same input without clearing all states will lead to a wrong result (1).

In addition: creating an iterative version of DFS isn't exactly hard, so you should give it a try, especially since your graph appears to be pretty large.

As for why your code works (or doesn't) the way it does:
This is hard to tell, since it depends upon quite a lot of factors. You didn't provide full code, so I can't rerun any tests, or verify that everything behaves the way it should. The most likely answers:

Vertex uses the default hash-code provided by Object. This leads to random ordering of the entries in the map of neighbours, thus the order in which specific paths are traversed is random in each run and most likely different. Thus you're traversing the graph using random paths, that quite likely (especially due to the size of your graph) differ for each run. The reason isn't the System.out.println, but the mere fact, that your code generates a different structure (from a ordering-POV, not mathematical), each time it runs plus the coincident, that for some pretty weird reason each build of the graph, that doesn't reach the necessary recursion-depth for a StackOverflow, and the code compiled with System.out.println appeared together.

The Java compiler, or JIT modifies the behavior of the code in a weird way. Modern compilers have the tendency to produce quite weird code in their attempts to optimize everything they can get hold off.

like image 73
Paul Avatar answered Oct 18 '22 21:10

Paul