I'm trying to parse a txt file that represents a grammar to be used in a recursive descent parser. The txt file would look something like this:
SPRIME ::= Expr eof
Expr ::= Term Expr'
Expr' ::= + Term Expr' | - Term Expr' | e
To isolate the left hand side and split the right hand side into seperate production rules, I take each line and call:
String[] firstSplit = line.split("::="); String LHS = firstSplit[0]; String productionRules = firstSplit[1].split("|");
However, when I call the second split method, I am not returned an array of the Strings separated by the "|" character, but an array of each indiviudual character on the right hand side, including "|". So for instance, if I was parsing the Expr' rule and printed the productionRules array, it would look like this:
"+"
"Term"
"Expr'"
""
"|"
When what I really want should look like this:
Anyone have any ideas what I'm doing wrong?
Answers. The SPLIT function belongs to VBA. It isn't part of Excel because it returns an array. Spreadsheets show elements of arrays in different cells.
backslash-dot is invalid because Java doesn't need to escape the dot. You've got to escape the escape character to get it as far as the regex which is used to split the string.
The splitter can be a single character, another string, or a regular expression. After splitting the string into multiple substrings, the split() method puts them in an array and returns it. It doesn't make any modifications to the original string.
In the case of splitting an empty string, the first mode (no argument) will return an empty list because the whitespace is eaten and there are no values to put in the result list. In contrast, the second mode (with an argument such as \n ) will produce the first empty field.
The parameter to String.split()
is a regular expression, and the vertical bar character is special.
Try escaping it with a backslash:
String productionRules = firstSplit[1].split("\\|");
NB: two backslashes are required, since the backslash character itself is special within string literals.
Since split
takes a regex as argument you have to escape all non-intended regex symbols.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With