Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java check string for special format when format contains loops

Tags:

java

string

Introduction

I'm working on a project where a user is able to enter facts and rules in a special format but I'm having some trouble with checking if that format is correct and obtaining the information.

When the program is launched the user can enter "commands" into a textarea and this text is send to a parseCommand method which determine what to do based on what the user has written. For example to add a fact or a rule you can use the prefix +. or use - to remove a fact or rule and so on.

I've created the system which handles the prefix but i'm having trouble with the facts and rules format.

Facts and rules

Facts: These are defined by an alphanumeric name and contain a list of properties (each is withing <> signs) and a truth value. Properties are also defined by an alphanumeric name and contain 2 strings (called arguments), again each is withing <> signs. Properties can also be negative by placing an ! before it in the list. For example the user could type the following to add these 3 facts to the program:

+father(<parent(<John>,<Jake>)>, true)

+father(<parent(<Jammie>,<Jake>)>, false)

+father(!<parent(<Jammie>,<Jake>)>, true)

+familyTree(<parent(<John>,<Jake>)>, <parent(<Jammie>,<Jake>)> , true)

+fathers(<parent(<John>,<Jake>)>, !<parent(<Jammie>,<Jake>)> , true)

The class I use to store facts is like this:

public class Fact implements Serializable{

    private boolean truth;
    private ArrayList<Property> properties;
    private String name;

    public Fact(boolean truth, ArrayList<Property> properties, String name){
        this.truth = truth;
        this.properties = properties;
        this.name = name;
    }
    //getters and setters here...
}

Rules: These are links between 2 properties and they are identified by the => sign Again their name is alphanumeric. The properties are limited though as they can only have arguments made up of uppercase letters and the arguments of the second property have to be the same as those the first one. Rules also have 2 other arguments which are either set or not set by entering the name or not (each of these arguments correspond with a property for the rule which can be Negative or Reversive). for example:

+son(<parent(<X>,<Y>)> => <child(<Y>,<X>)>)

+son(<parent(<X>,<Y>)> => <child(<Y>,<X>)>, Negative, Reversive)

+son(<parent(<X>,<Y>)> => <child(<Y>,<X>)>, Reversive)

+son(<parent(<X>,<Y>)> => <child(<Y>,<X>)>, Negative)

Rule Properties

A normal rule tells us that if, in the example below, X is a parent of Y this implies that Y is a child of X :

son(<parent(<X>,<Y>)> => <child(<Y>,<X>)>)

While a Negative rule tells us that if, in the example below, X is a parent of Y this implies that Y is not a child of X :

son(<parent(<X>,<Y>)> => <child(<Y>,<X>)>, Negtive)

A Reversive rule however tells us that if, in the example below, Y is a child of X this implies that X is a parent of Y

son(<parent(<X>,<Y>)> => <child(<Y>,<X>)>, Reversive)

The last case is when the rule is both Negative and Reversive. This tells us that if, in the example below, Y is not a child of X this implies that X is a parent of Y.

son(<parent(<X>,<Y>)> => <child(<Y>,<X>)>, Negative, Reversive)

This is the class I use to store rules:

public class Rule implements Serializable{

    private Property derivative;
    private Property impliant;
    private boolean negative;
    private boolean reversive;
    private String name;

    public Rule(Property derivative, Property impliant, boolean negative, boolean reversive) throws InvalidPropertyException{
        if(!this.validRuleProperty(derivative) || !this.validRuleProperty(impliant))
            throw new InvalidPropertyException("One or more properties are invalid");
        this.derivative = derivative;
        this.impliant = impliant;
        this.negative = negative;
        this.reversive = reversive;
    }
    //getters and setters here
}

Property class:

public class Property implements Serializable{

    private String name;
    private String firstArgument;
    private String secondArgument;

    public Property(String name, String firstArgument, String secondArgument){
        this.name = name;
        this.firstArgument = firstArgument;
        this.secondArgument = secondArgument;
    }

The above examples are all valid inputs. Just to clarify here are some invalid input examples:

Facts:

No true or false is provided for the argument:

+father(<parent(<John>,<Jake>)>) 

No property given:

+father(false) 

An invalid property is provided:

+father(<parent(<John>)>, true) 

+father(<parent(John, Jake)>, true) 

+father(<parent(John, Jake, Michel)>, true) 

+father(parent(<John>,<Jake>), true)

Note the missing bracket in the last one.

Rules:

One or more properties are invalid:

+son(<parent(<X>,<Y>)> => child(<Y>,<X>))

+son(parent(<X>,<Y>) => child(<Y>,<X>))

+son(<parent(<X>,<Y>)> => <child(<Z>,<X>)>) (Note the Z in the child property)

+son(<parent(<Not Valid>,<Y>)> => child(<Y>,<X>)) (Invalid argument for first property)

+son(=> child(<Y>,<X>))

The problem

I'm able to get the input from the user and I'm also able to see which kind of action the user wants to preform based on the prefix.

However I'm not able to figure out how to process strings like:

+familyTree(<parent(<John>,<Jake>)>, <parent(<Jammie>,<Jake>)> , true)

This is due to a number of reasons:

  1. The number of properties for a fact entered by the user is variable so I cant just split the input string based on the () and <> signs.
  2. For rules, sometimes, the last 2 properties are variable so it can happen that the 'Reversive' property is on the place in the string where you would normally find the Negative property.
  3. If I want to get arguments from this part of the input string: +familyTree(<parent(<John>,<Jake>)>, to setup the property for this fact I can check for anything that is in between <> that might form a problem because there are 2 opening < before the first >

What I've tried

My first idea was to start at the beginning of the string (which I did for getting the action from the prefix) and then remove that piece of string from the main string.

However I don't know how to adapt this system to the problems above(specially problem number 1 and 2).

I've tried to use functions like: String.split() and String.contains().

How would I go about doing this? How can I get arount the fact that not all strings contain the same information? (In a sense that some facts have more properties or some rules have more attributes than others.)

EDIT:

I forgot to say that all the methods used to store the data are finished and work and they can be used by calling for example: infoHandler.addRule() or infoHandler.removeFact(). Inside these functions I could also validate input data if this is better.

I could, for example, just obtain all data of the fact or rule from the string and validate things like are the arguments of the properties of rules only using uppercase letters and so on.

EDIT 2:

In the comments someone has suggested using a parser generator like ANTLR or JavaCC. I'e looked into that option in the last 3 days but I can't seem to find any good source on how to define a custom language in it. Most documentation assumes you're trying to compile an exciting language and recommend downloading the language file from somewhere instead of writing your own.

I'm trying to understand the basics of ANTLR (which seems to be the one which is easiest to use.) However there is not a lot of recources online to help me.

If this is a viable option, could anyone help me understand how to do something like this in ANTLR?

Also once I've written a grammer file how am I sopposed to use it? I've read something about generating a parser from the language file but I cant seem to figure out how that is done...

EDIT 3:

I've begon to work on a grammer file for ANTLR which looks like this:

/** Grammer used by communicate parser */

grammar communicate;


/*
 * Parser Rules
 */

argument            : '<' + NAMESTRING + '>' ;

ruleArgument        : '<' + RULESTRING + '>' ;

property            : NAMESTRING + '(' + argument + ',' + argument + ')' ;

propertyArgument    : (NEGATIVITY | POSITIVITY) + property + '>' ;

propertyList        : (propertyArgument + ',')+ ;

fact                : NAMESTRING + '(' + propertyList + ':' + (TRUE | FALSE) + ')';

rule                : NAMESTRING + '(' + ruleArgument + '=>' + ruleArgument + ':' + RULEOPTIONS + ')' ;

/*
 * Lexer Rules
 */

fragment LOWERCASE  : [a-z] ;
fragment UPPERCASE  : [A-Z] ;

NAMESTRING          : (LOWERCASE | UPPERCASE)+ ;

RULESTRING          : (UPPERCASE)+ ;

TRUE                : 'True';

FALSE               : 'False';

POSITIVITY          : '!<';

NEGATIVITY          : '<' ;

NEWLINE             : ('\r'? '\n' | '\r')+ ;

RULEOPTIONS         : ('Negative' | 'Negative' + ',' + 'Reversive' | 'Reversive' );

WHITESPACE          : ' ' -> skip ;

Am I on the right track here? If this is a good grammer file how can I test and use it later on?

like image 453
BRHSM Avatar asked Mar 18 '18 20:03

BRHSM


People also ask

What is %d and %s in Java?

%s refers to a string data type, %f refers to a float data type, and %d refers to a double data type.

How do you check if a string contains a sequence?

The contains() method checks whether a string contains a sequence of characters. Returns true if the characters exist and false if not.

What does %d do Java?

%d: Specifies Decimal integer. %c: Specifies character. %T or %t: Specifies Time and date. %n: Inserts newline character.

What does .format do in Java?

format() method returns the formatted string by a given locale, format, and argument. If the locale is not specified in the String. format() method, it uses the default locale by calling the Locale. getDefault() method.


2 Answers

I'm afraid I can't make out the exact grammar you are trying to parse from your description, but I understand you are trying to create entity objects from the parsed grammar. The following few demo files demonstrate how to do that using ANTLR-4 and Maven:

pom.xml:

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
  <modelVersion>4.0.0</modelVersion>
  <groupId>com.stackoverflow</groupId>
  <artifactId>communicate</artifactId>
  <version>0.0.1-SNAPSHOT</version>

  <properties>
    <maven-compiler.version>3.6.1</maven-compiler.version>
    <java.version>1.8</java.version>
    <antlr.version>4.5.3</antlr.version>
    <commons-io.version>2.5</commons-io.version>
    <junit.version>4.12</junit.version>
  </properties>

  <build>
    <testResources>
      <testResource>
        <directory>src/test/resources</directory>
        <targetPath>com/stackoverflow/test/communicate/resources</targetPath>
      </testResource>
    </testResources>
    <plugins>
      <plugin>
        <artifactId>maven-compiler-plugin</artifactId>
        <version>${maven-compiler.version}</version>
        <configuration>
          <source>${java.version}</source>
          <target>${java.version}</target>
        </configuration>
      </plugin>
      <plugin>
        <groupId>org.antlr</groupId>
        <artifactId>antlr4-maven-plugin</artifactId>
        <version>${antlr.version}</version>
        <configuration>
          <sourceDirectory>${basedir}/src/main/resources</sourceDirectory>
          <outputDirectory>${basedir}/src/main/java/com/stackoverflow/communicate/frontend</outputDirectory>
        </configuration>
        <executions>
          <execution>
            <goals>
              <goal>antlr4</goal>
            </goals>
          </execution>
        </executions>
      </plugin>
    </plugins>
  </build>

  <dependencies>
    <dependency>
      <groupId>org.antlr</groupId>
      <artifactId>antlr4-runtime</artifactId>
      <version>${antlr.version}</version>
    </dependency>
    <dependency>
      <groupId>commons-io</groupId>
      <artifactId>commons-io</artifactId>
      <version>${commons-io.version}</version>
    </dependency>
    <dependency>
      <groupId>junit</groupId>
      <artifactId>junit</artifactId>
      <version>${junit.version}</version>
      <scope>test</scope>
    </dependency>
  </dependencies>
</project>

src/main/resources/communicate.g4

grammar communicate;

@header {
    package com.stackoverflow.communicate.frontend;
}

fact returns [com.stackoverflow.communicate.ir.Property value]
   : property { $value = $property.value; }
   ;

property returns [com.stackoverflow.communicate.ir.Property value]
   : STRING '(<' argument { com.stackoverflow.communicate.ir.ArgumentTerm lhs = $argument.value; } '>,<' argument '>)' { $value = new com.stackoverflow.communicate.ir.Property($STRING.text, lhs, $argument.value); }
   ;

argument returns [com.stackoverflow.communicate.ir.ArgumentTerm value]
   : STRING { $value = new com.stackoverflow.communicate.ir.ArgumentTerm($STRING.text); }
   ;

STRING
   : [a-zA-Z]+
   ;

src/main/java/com/stackoverflow/communicate/ir/ArgumentTerm.java

package com.stackoverflow.communicate.ir;

public class ArgumentTerm {
  public String Value;

  public ArgumentTerm(String value) {
    Value=value;
  }
}

src/main/java/com/stackoverflow/communicate/ir/Property.java

package com.stackoverflow.communicate.ir;

public class Property {
  public String Name;
  public ArgumentTerm Lhs;
  public ArgumentTerm Rhs;

  public Property(String name, ArgumentTerm lhs, ArgumentTerm rhs) {
    Name=name;
    Lhs=lhs;
    Rhs=rhs;
  }
}

src/test/resources/frontend/father.txt

parent(<John>,<Jane>)

src/test/java/com/stackoverflow/test/communicate/frontend/FrontendTest.java

package com.stackoverflow.test.communicate.frontend;

import java.io.BufferedInputStream;
import java.io.BufferedOutputStream;
import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;

import org.antlr.v4.runtime.ANTLRFileStream;
import org.antlr.v4.runtime.CharStream;
import org.antlr.v4.runtime.CommonTokenStream;
import org.antlr.v4.runtime.TokenStream;
import org.antlr.v4.runtime.tree.ParseTree;
import org.apache.commons.io.FileUtils;
import org.apache.commons.io.IOUtils;
import org.junit.Assert;
import org.junit.Test;

import com.stackoverflow.communicate.frontend.communicateLexer;
import com.stackoverflow.communicate.frontend.communicateParser;

public class FrontendTest {
  private String testResource(String path) throws IOException {
    File file=null;
    try {
      file=File.createTempFile("test", ".txt");
      try(InputStream is=new BufferedInputStream(
          FrontendTest.class.getResource(path).openStream());
          OutputStream fos=new FileOutputStream(file);
          OutputStream os=new BufferedOutputStream(fos)) {
        IOUtils.copy(is, os);
      }
      CharStream fileStream=new ANTLRFileStream(file.getAbsolutePath());
      communicateLexer lexer=new communicateLexer(fileStream);
      TokenStream tokenStream=new CommonTokenStream(lexer);
      communicateParser parser=new communicateParser(tokenStream);
      ParseTree tree=parser.fact();
      return tree.toStringTree(parser);
    } finally {
      FileUtils.deleteQuietly(file);
    }
  }

  @Test
  public void testArgumentTerm() throws IOException {
    Assert.assertEquals(
        "(fact (property parent (< (argument John) >,< (argument Jane) >)))",
        testResource(
            "/com/stackoverflow/test/communicate/resources/frontend/father.txt"));
  }
}

The attached POM file generates the parser classes (communicateParser) for the grammar communicate.g4 if you call mvn antlr4:antlr4. FrontendTest is a JUnit unit test which parses the content of father.txt, which creates a Property entity with the name "parent" and containing two argument term objects John and Jane.

A full Eclipse Java project with these files is uploaded here: https://www.file-upload.net/download-13056434/communicate.zip.html

like image 186
Pascal Kesseli Avatar answered Oct 18 '22 21:10

Pascal Kesseli


I dont think a syntax analyzer is good for your problem. anyway you can handle it simpler by using regex and some string utilities.

It's better to start from small problem and move to the bigger ones: first parsing the property itself seems easy so we write a method to do that:

 private static Property toProp(String propStr) {
    String name = propStr.substring(1,propStr.indexOf("("));
    String[] arguments = propStr.substring(propStr.indexOf('(')+1,propStr.indexOf(')')).split(",");
    return new Property(name,
            arguments[0].substring(1,arguments[0].length()-1),
            arguments[1].substring(1,arguments[1].length()-1));
  }

To parse Fact string, using regex make things easier,regex for property is /<[\w\d]([<>\w\d,])>/ and by the help of toProp method we have written already we can create another method to parse Facts:

public static Fact handleFact(String factStr) {
    Pattern propertyPattern = Pattern.compile("<[\\w\\d]*\\([<>\\w\\d,]*\\)>");
    int s = factStr.indexOf("(") + 1;
    int l = factStr.lastIndexOf(")");
    String name = factStr.substring(0,s-1);
    String params = factStr.substring(s, l);
    Matcher matcher = propertyPattern.matcher(params);
    List<Property> props  = new ArrayList<>();
    while(matcher.find()){
      String propStr = matcher.group();
      props.add(toProp(propStr));
    }
    String[] split = propertyPattern.split(params);
    boolean truth = Boolean.valueOf(split[split.length-1].replaceAll(",","").trim());
    return new Fact(truth,props,name);
  }

Parsing rules is very similar to facts:

 private static Rule handleRule(String ruleStr) {
    Pattern propertyPattern = Pattern.compile("<[\\w\\d]*\\([<>\\w\\d,]*\\)>");
    String name = ruleStr.substring(0,ruleStr.indexOf('('));
    String params = ruleStr.substring(ruleStr.indexOf('(') + 1, ruleStr.lastIndexOf(')'));
    Matcher matcher = propertyPattern.matcher(params);
    if(!matcher.find())
      throw new IllegalArgumentException();
    Property prop1 = toProp(matcher.group());
    if(!matcher.find())
      throw new IllegalArgumentException();
    Property prop2 = toProp(matcher.group());
    params = params.replaceAll("<[\\w\\d]*\\([<>\\w\\d,]*\\)>","").toLowerCase();
    return new Rule(name,prop1,prop2,params.contains("negative"),params.contains("reversive"));
  }
like image 43
Isa Hekmat Avatar answered Oct 18 '22 19:10

Isa Hekmat