Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I get to the bottom of an AST Expression

I'm new to AST (my first time writing a plugin). Expressions can be quite complex in real life. I wanted to know how to resolve the left and right side of an asignment, for instance.

class Visitor extends ASTVisitor
{
    @Override
    public boolean visit(Assignment node)
    {
        //here, how do I get the final name to each each side of the assignment resolves?
    }
}

I have also another doubt, how do I get the instance used to invoke a method?

public boolean visit(MethodInvocation node)
{
    //how do I get to know the object used to invoke this method?
    //like, for example, MyClass is a class, and it has a field called myField
    //the type of myField has a method called myMethod.
    //how do I find myField? or for that matter some myLocalVariable used in the same way.
}

suppose the following assignment

SomeType.someStaticMethod(params).someInstanceMethod(moreParams).someField =
     [another expression with arbitrary complexity]

how do I get to someField from the Assigment node?

And also, what property of a MethodInvocation gives me the instance used to invoke the method?

EDIT 1: My question was apparently unclear, given the answer I received. I don't want to solve this particular expression. I want to be able to, given any assignment, find out the name it's being assigned to, and the name (if not an rvalue) that assigns to the first.

So, for instance, the parameters of the method invocations could be field accesses or previously declared local variables.

SomeType.someStaticMethod(instance.field).someInstanceMethod(type.staticField, localVariable, localField).Field.destinationField

So, here is the hopefully objective question: Given any assignment statement with both the left side and right side having arbitrary complexity, how to get the ultimate field/variable that gets assigned to, and the ultimate (if any) field/variable that assigns to it.

EDIT 2: To be more specific, what I want to achieve is immutability, via an annotation @Const:

/**
* When Applied to a method, ensures the method doesn't change in any
* way the state of the object used to invoke it, i.e., all the fields
* of the object must remain the same, and no field may be returned,
* unless the field itself is marked as {@code @Const} or the field is
* a primitive non-array type. A method  annotated with {@code @Const} 
* can only invoke other {@code @Const} methods of its class, can only 
* use the class's fields to invoke {@code @Const} methods of the fields 
* classes and can only pass fields as parameters to methods that 
* annotate that formal parameter as {@code @Const}.
*
* When applied to a formal parameter, ensures the method will not
* modify the value referenced by the formal parameter. A formal   
* parameter annotated as {@code @Const} will not be aliased inside the
* body of the method. The method is not allowed to invoke another 
* method and pass the annotated parameter, save if the other method 
* also annotates the formal parameter as {@code @Const}. The method is 
* not allowed to use the parameter to invoke any of its type's methods,
* unless the method being invoked is also annotated as {@code @Const}
* 
* When applied to a field, ensures the field cannot be aliased and that
* no code can alter the state of that field, either from inside the   
* class that owns the field or from outside it. Any constructor in any
* derived class is allowed to set the value of the field and invoke any
* methods using it. As for methods, only those annotated as
* {@code @Const} may be invoked using the field. The field may only be
* passed as a parameter to a method if the method annotates the 
* corresponding formal parameter as {@code @Const}
* 
* When applied to a local variable, ensures neither the block where the
* variable is declared or any nested block will alter the value of that 
* local variable. The local variable may be defined only once, at any
* point where it is in scope and cannot be aliased. Only methods
* annotated as {@code @Const} may be invoked using this variable, and 
* the variable  may only be passed as a parameter to another method if 
* said method annotates its corresponding formal parameter as
* {@code @Const}
*
*/
@Retention(RetentionPolicy.SOURCE)
@Target({ElementType.METHOD, ElementType.PARAMETER, ElementType.FIELD,
ElementType.LOCAL_VARIABLE})
@Inherited
public @interface Const
{

}

To achieve this, First thing I have to do is cases when the left side of an assignment is marked as @Const (easy enough). I must also detect when the right side of and expression is a field marked as @Const, in which case it can only be assigned at the definition of a @Const variable of the same type.

The problem is I'm having a real hard time finding the ultimate field on the right side of the expression, to avoid aliasing the field and rendering the @Const annotation useless.

like image 335
FinnTheHuman Avatar asked Apr 24 '16 16:04

FinnTheHuman


2 Answers

First quoting from an answer I posted:

You will have to work with bindings. To have bindings available, that means resolveBinding() not returning null, possibly additional steps I have posted are necessary.

The following visitor should help you doing what you want:

class AssignmentVisitor extends ASTVisitor {

    public boolean visit(Assignment node) {
        ensureConstAnnotationNotViolated(node);
        return super.visit(node);
    }

    private void ensureConstAnnotationNotViolated(Assignment node) {
        Expression leftHandSide = node.getLeftHandSide();
        if (leftHandSide.getNodeType() == ASTNode.FIELD_ACCESS) {
            FieldAccess fieldAccess = (FieldAccess) leftHandSide;
            // access field IVariableBinding
            fieldAccess.resolveFieldBinding();
            // access IAnnotationBindings e.g. your @const
            fieldAccess.resolveFieldBinding().getAnnotations();
            // access field ITypeBinding
            fieldAccess.getExpression().resolveTypeBinding();
        } else {
            // TODO: check possible other cases
        }

    }
}
like image 188
sevenforce Avatar answered Sep 23 '22 19:09

sevenforce


Visitors are a really great tool, but the proper solution to a specific problem is not always to have a single visitor patiently wait until its visit methods get invoked... The question you are asking is an example of such a situation.

Let's rephrase what you are trying to do:

  1. You want to identify to identify every assignment (that is leftSide = rightSide)

  2. For each assignment, you want to determine the nature of the left hand side (that is, either it is a local variable or a field access), and if it is indeed a field access, you want to build a "path" corresponding to that field (that is the source object, followed by a series of either method call or field access, and ending with a field access).

  3. For each assignment, you want to determine a similar "path" corresponding to the right hand side.

I think that you have already resolved point number 1: you simply create a class that extends org.eclipse.jdt.core.dom.ASTVisitor; there, you override the #visit(Assignment) method. Finally, wherever it is appropriate, you instantiate your visitor class, and have it visit an AST tree, starting at which ever node that is match for your needs (most likely an instance of CompilationUnit, TypeDeclaration or MethodDeclaration).

Then what? The #visit(Assignment) method indeed receives an Assignment node. Directly on that object, you can obtain both the left hand side and right hand side expressions (assignment.getLeftHandSide() and assignment.getRightHandSide()). As you mentioned, both are Expressions, which can turn out being pretty complicated, so how can we extract a clean, linear "path" out of these subtrees? A visitor is certainly the best way to do so, but here's the catch, it should be done using distinct visitors, rather than having your first visitor (the one catching Assignments) continue its descend either side expressions. It is technically possible to do it all using a single visitor, but that would involve significant state management inside that visitor. I'm pretty pretty much convinced anyway that the complexity of such management would be so high that such implementation would actually be less efficient that the distinct visitors approach.

So we could image something like this:

class MyAssignmentListVisitor extends ASTVisitor {
    @Override
    public boolean visit(Assignment assignment) {
        FieldAccessLineralizationVisitor leftHandSideVisitor = new FieldAccessLineralizationVisitor();
        assignment.getLeftHandSide().accept(leftHandSideVisitor);
        LinearFieldAccess leftHandSidePath = leftHandSideVisitor.asLinearFieldAccess();

        FieldAccessLineralizationVisitor rightHandSideVisitor = new FieldAccessLineralizationVisitor();
        assignment.getRightHandSide().accept(rightHandSideVisitor);
        LinearFieldAccess rightHandSidePath = rightHandSideVisitor.asLinearFieldAccess();

        processAssigment(leftHandSidePath, rightHandSidePath);

        return true;
    }
}

class FieldAccessLineralizationVisitor extends ASTVisitor {

    List<?> significantFieldAccessParts = [...];

    // ... various visit method expecting concrete subtypes of Expression ...

    @Override
    public boolean visit(Assignment assignment) {
        // Found an assignment inside an assignment; ignore its
        // left hand side, as it does not affect the "path" for 
        // the assignment currently being investigated

        assignment.getRightHandSide().accept(this);

        return false;
    }
}

Note in this code that MyAssignmentListVisitor.visit(Assignment) returns true, to indicate that children of the assignment should recursively be inspected. This may sounds unnecessary at first, the Java language indeed support several constructions where an assignment may contain other assignments; consider for example the following extreme case:

(varA = someObject).someField = varB = (varC = new SomeClass(varD = "string").someField);

For the same reason, only the right hand side of an assignment is visited during linearization of an expression, given that the "resulting value" of an assignment is its right hand side. The left hand side is, in this situation, a mere side effect that can be safely ignored.

I will not go any further in prototyping how paths are actually modelled, given that I don't know the nature of informations that are needed for your specific situation. It may also be more appropriate for you to create distinct visitor classes respectively for the left hand side expression and the right hand side expression, for example to better handle the fact that the right hand side might actually involve multiple variables/fields/method invocations combined through binary operators. That will have to be your decision.

There is still some major concerns regarding visitor traversal of the AST tree to be discussed, namely that, by relying on the default node traversal order, you loose the opportunity to get information on the relationship between each nodes. For example, given expression this.someMethod(this.fieldA).fieldB, you will see something similar to the following sequence:

FieldAccess      => corresponding to the whole expression
MethodInvovation => corresponding to this.someMethod(this.fieldA)
ThisExpression
SimpleName ("someMethod")
FieldAccess      => corresponding to this.fieldA
ThisExpression
SimpleName ("fieldA")
SimpleName ("fieldB")

There simply no way you can actually deduce the linearized expression from this sequence of events. You will instead want to explicitly intercept every nodes, and explicitly recurse on a node's children only if appropriate, and in appropriated order. For example, we could have done the following:

    @Override
    public boolean visit(FieldAccess fieldAccess) {
        // FieldAccess :: <expression>.<name>

        // First descend on the "subject" of the field access
        fieldAccess.getExpression().accept(this);

        // Then append the name of the accessed field itself
        this.path.append(fieldAccess.getName().getIdentifier());

        return false;
    }

    @Override
    public boolean visit(MethodInvocation methodInvocation) {
        // MethodInvocation :: <expression>.<methodName><<typeArguments>>(arguments)

        // First descend on the "subject" of the method invocation
        methodInvocation.getExpression().accept(this);

        // Then append the name of the accessed field itself
        this.path.append(methodAccess.getName().getIdentifier() + "()");

        return false;
    }

    @Override
    public boolean visit(ThisExpression thisExpression) {
        // ThisExpression :: [<qualifier>.] this

        // I will ignore the qualifier part for now, it will be up
        // to you to determine if it is pertinent
        this.path.append("this");

        return false;
    }

These methods would, given the preceding example, collect in path the following sequence: this, someMethod(), fieldB. This is, I believe, pretty close to what you are looking for. Should you want to collect all field access/method invocations sequences (for example, you would like your visitor to return both this,someMethod(),fieldB and this,fieldA), then you could rewrite the visit(MethodInvocation) method roughly similar to this:

    @Override
    public boolean visit(MethodInvocation methodInvocation) {
        // MethodInvocation :: <expression>.<methodName><<typeArguments>>(arguments)

        // First descend on the "subject" of the method invocation
        methodInvocation.getExpression().accept(this);

        // Then append the name of the accessed field itself
        this.path.append(methodAccess.getName().getIdentifier() + "()");

        // Now deal with method arguments, each within its own, distinct access chain
        for (Expression arg : methodInvocation.getArguments()) {
            LinearPath orginalPath = this.path;
            this.path = new LinearPath();

            arg.accept(this);

            this.collectedPaths.append(this.path);
            this.path = originalPath;
        }

        return false;
    }

Finally, should you be interested in knowing the type of values at the each step in the path, you will have to have a look at the binding objects associated with each node, for example: methodInvocation.resolveMethodBinding().getDeclaringClass(). Note however that binding resolution must have been explicitly requested at the construction of the AST tree.

There are many more language constructs that will not be correctly handled by the above code; still, I believe that you should be able to work out these remaining issues yourself. Should you have need for a reference implementation to look at, check out class org.eclipse.jdt.internal.core.dom.rewrite.ASTRewriteFlattener, which basically reconstruct Java source code from an existing AST tree; though this specific visitor is much larger than most other ASTVisitors, it is somewhat much easier to understand.

UPDATE IN RESPONSE TO OP'S EDIT #2

Here's an update starting point following your most recent edit. There are still many cases to be handled, but that is more in line with your specific problem. Note also that though I used numerous instanceof checks (because that's easier for me at current time, given that I'm writing the code in a simple text editor, and don't have code completion on ASTNode constants), you may opt instead for a switch statement on node.getNodeType(), which will usually be more efficient.

class ConstCheckVisitor extends ASTVisitor {

    @Override
    public boolean visit(MethodInvocation methodInvocation) {    
        if (isConst(methodInvocation.getExpression())) {
            if (isConst(methodInvocation.resolveMethodBinding().getMethodDeclaration()))
                reportInvokingNonConstMethodOnConstSubject(methodInvocation);
        }

        return true;
    }

    @Override
    public boolean visit(Assignment assignment) {
        if (isConst(assignment.getLeftHandSide())) {
            if ( /* assignment to @Const value is not acceptable in the current situation */ )
                reportAssignmentToConst(assignment.getLeftHandSide());

            // FIXME: I assume here that aliasing a @Const value to
            //        another @Const value is acceptable. Is that right?

        } else if (isImplicitelyConst(assigment.getLeftHandSide())) {
            reportAssignmentToImplicitConst(assignment.getLeftHandSide());        

        } else if (isConst(assignment.getRightHandSide())) {
            reportAliasing(assignment.getRightHandSide());
        }

        return true;
    }

    private boolean isConst(Expression expression) {
        if (expression instanceof FieldAccess)
            return (isConst(((FieldAccess) expression).resolveFieldBinding()));

        if (expression instanceof SuperFieldAccess)
            return isConst(((SuperFieldAccess) expression).resolveFieldBinding());

        if (expression instanceof Name)
            return isConst(((Name) expression).resolveBinding());

        if (expression instanceof ArrayAccess)
            return isConst(((ArrayAccess) expression).getArray());

        if (expression instanceof Assignment)
            return isConst(((Assignment) expression).getRightHandSide());

        return false;
    }

    private boolean isImplicitConst(Expression expression) {
        // Check if field is actually accessed through a @Const chain
        if (expression instanceof FieldAccess)
            return isConst((FieldAccess expression).getExpression()) ||
                   isimplicitConst((FieldAccess expression).getExpression());

        // FIXME: Not sure about the effect of MethodInvocation, assuming
        //        that its subject is const or implicitly const

        return false;
    }

    private boolean isConst(IBinding binding) {
        if ((binding instanceof IVariableBinding) || (binding instanceof IMethodBinding))
            return containsConstAnnotation(binding.getAnnotations());

        return false;
    }
}

Hope that helps.

like image 39
jwatkins Avatar answered Sep 23 '22 19:09

jwatkins