Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find direct and indirect subclasses by scanning filesystem

I'm having a problem in writing an algorithm to help me scan a file system and find all subclasses of a certain class.

Details:

I've an app that scans an external application using nio Files.walk() while retrieving I check for "extends SuperClass" while reading the file if the word exits, I add the class name in my list as follows:

List<String> subclasses = new ArrayList<>();
Files.walk(appPath)
     .filter(p->Files.isRegularFile(p) && p.toString()
     .endsWith(".java")).forEach(path -> {
        try {
         List<String> lines = Files.readAllLines(path);
         Pattern pattern = Pattern.compile("\\bextends SuperClass\\b");
         Matcher matcher = pattern
                           .matcher(lines.stream()
                                 .collect(Collectors.joining(" ")));
         boolean isChild = matcher.find();
         if(isChild) subclasses.add(path.getFileName().toString());
        }catch (IOException e){
                //handle IOE
        }

The problem with the above is that it only gets direct subclasses of SuperClass but I need to retrieve all direct and indirect subclasses. I thought about recursion since I've no Idea how many subclasses of SuperClass there is but I couldn't implement any reasonable implementation.

NOTES:

  • Scanning more than 600 thousands file
  • I have no Idea how many direct/indirect subclasses of SuperClass there is
  • The application that I'm scanning is external and I can't modify its code so I'm only allowed to access it by reading files and see where extends exists
  • If there is a non-recursive solution to the problem that would be great but if there's no other way, I'll be more than happy to accept a recursive one since I care about the solution more than performance.

Edit:

I use the following regex to compare both name and import to make sure even in case of same name different packages the output is correct:

Pattern pattern = Pattern.compile("("+superClasss.getPackage()+")[\\s\\S]*(\\bextends "+superClass.getName()+"\\b)[\\s\\S]");

I also tried:

Pattern pattern = Pattern.compile("\\bextends "+superClass.getName()+"\\b");

But there is also some missing subclasses, I believe that the code bellow skips some checks, and doesn't fully work:

public static List<SuperClass> getAllSubClasses(Path path, SuperClass parentClass) throws IOException{
classesToDo.add(baseClass);
while(classesToDo.size() > 0) {
    SuperClass superClass = classesToDo.remove(0);
    List<SuperClass> subclasses = getDirectSubClasses(parentPath,parentClass);
    if(subclasses.size() > 0)
        classes.addAll(subclasses);
    classesToDo.addAll(subclasses);
}
return classes;

}

Any help is truly appreciated!

Edit 2 I also noticed another problem, is that when I detect a subclass I get the file name currentPath.getFileName() which might or might not be the subclass name as the subclass may be a nested or non-public class in the same file.

like image 837
Ahmad Sanie Avatar asked Apr 18 '18 10:04

Ahmad Sanie


Video Answer


1 Answers

I strongly recommend parsing compiled class files instead of source code. Since these class files are already optimized for being processed by machines, a lot of the complexity and corner cases of the source code file processing has been eliminated.

So a solution to build a complete class hierarchy tree using the ASM library would look like this:

public static Map<String, Set<String>> getClassHierarchy(Path root) throws IOException {
    return Files.walk(root)
         .filter(p->Files.isRegularFile(p) && isClass(p.getFileName().toString()))
         .map(p -> getClassAndSuper(p))
         .collect(Collectors.groupingBy(Map.Entry::getValue,
                Collectors.mapping(Map.Entry::getKey, Collectors.toSet())));
}
private static boolean isClass(String fName) {
    // skip package-info and module-info
    return fName.endsWith(".class") && !fName.endsWith("-info.class");
}
private static Map.Entry<String,String> getClassAndSuper(Path p) {
    final class CV extends ClassVisitor {
        Map.Entry<String,String> result;
        public CV() {
            super(Opcodes.ASM5);
        }
        @Override
        public void visit(int version, int access,
                String name, String signature, String superName, String[] interfaces) {
            result = new AbstractMap.SimpleImmutableEntry<>(
                Type.getObjectType(name).getClassName(),
                superName!=null? Type.getObjectType(superName).getClassName(): "");
        }
    }
    try {
        final CV visitor = new CV();
        new ClassReader(Files.readAllBytes(p)).accept(visitor, ClassReader.SKIP_CODE);
        return visitor.result;
    } catch (IOException ex) {
        throw new UncheckedIOException(ex);
    }
}

As a bonus, resp. to create some test cases, the following method adds the ability to build the hierarchy for a runtime class’ source:

public static Map<String, Set<String>> getClassHierarchy(Class<?> context)
                                        throws IOException, URISyntaxException {
    Path p;
    URI clURI = context.getResource(context.getSimpleName()+".class").toURI();
    if(clURI.getScheme().equals("jrt")) p = Paths.get(URI.create("jrt:/modules"));
    else {
        if(!clURI.getScheme().equals("file")) try {
            FileSystems.getFileSystem(clURI);
        } catch(FileSystemNotFoundException ex) {
            FileSystems.newFileSystem(clURI, Collections.emptyMap());
        }
        String qn = context.getName();
        p = Paths.get(clURI).getParent();
        for(int ix = qn.indexOf('.'); ix>0; ix = qn.indexOf('.', ix+1)) p = p.getParent();
    }
    return getClassHierarchy(p);
}

Then, you can do

Map<String, Set<String>> hierarchy = getClassHierarchy(Number.class);
System.out.println("Direct subclasses of "+Number.class);
hierarchy.getOrDefault("java.lang.Number", Collections.emptySet())
         .forEach(System.out::println);

and get

Direct subclasses of class java.lang.Number
java.lang.Float
java.math.BigDecimal
java.util.concurrent.atomic.AtomicLong
java.lang.Double
java.lang.Long
java.util.concurrent.atomic.AtomicInteger
java.lang.Short
java.math.BigInteger
java.lang.Byte
java.util.concurrent.atomic.Striped64
java.lang.Integer

or

Map<String, Set<String>> hierarchy = getClassHierarchy(Number.class);
System.out.println("All subclasses of "+Number.class);
printAllClasses(hierarchy, "java.lang.Number", "  ");
private static void printAllClasses(
        Map<String, Set<String>> hierarchy, String parent, String i) {
    hierarchy.getOrDefault(parent, Collections.emptySet())
        .forEach(x -> {
            System.out.println(i+x);
            printAllClasses(hierarchy, x, i+"  ");
    });
}

to get

All subclasses of class java.lang.Number
  java.lang.Float
  java.math.BigDecimal
  java.util.concurrent.atomic.AtomicLong
  java.lang.Double
  java.lang.Long
  java.util.concurrent.atomic.AtomicInteger
  java.lang.Short
  java.math.BigInteger
  java.lang.Byte
  java.util.concurrent.atomic.Striped64
    java.util.concurrent.atomic.LongAdder
    java.util.concurrent.atomic.LongAccumulator
    java.util.concurrent.atomic.DoubleAdder
    java.util.concurrent.atomic.DoubleAccumulator
  java.lang.Integer
like image 86
Holger Avatar answered Sep 29 '22 12:09

Holger