I'm working with a service that provides data as a Lisp-like S-Expression string. This data is arriving thick and fast, and I want to churn through it as quickly as possible, ideally directly on the byte stream (it's only single-byte characters) without any backtracking. These strings can be quite lengthy and I don't want the GC churn of allocating a string for the whole message.
My current implementation uses CoCo/R with a grammar, but it has a few problems. Due to the backtracking, it assigns the whole stream to a string. It's also a bit fiddly for users of my code to change if they have to. I'd rather have a pure C# solution. CoCo/R also does not allow for the reuse of parser/scanner objects, so I have to recreate them for each message.
Conceptually the data stream can be thought of as a sequence of S-Expressions:
(item 1 apple)(item 2 banana)(item 3 chainsaw)
Parsing this sequence would create three objects. The type of each object can be determined by the first value in the list, in the above case "item". The schema/grammar of the incoming stream is well known.
Before I start coding I'd like to know if there are libraries out there that do this already. I'm sure I'm not the first person to have this problem.
EDIT
Here's a little more detail on what I want as I think the original question may have been a little vague.
Given some SExpressions, such as:
(Hear 12.3 HelloWorld)
(HJ LAJ1 -0.42)
(FRP lf (pos 2.3 1.7 0.4))
I want a list of objects equivalent to this:
{
new HearPerceptorState(12.3, "HelloWorld"),
new HingeJointState("LAJ1", -0.42),
new ForceResistancePerceptorState("lf", new Polar(2.3, 1.7, 0.4))
}
The actual data set I'm working on is a list of perceptors from a robot model in the RoboCup 3D simulated soccer league. I may potentially also need to deserialise another set of related data with a more complex structure.
In my opinion a parse generator is unneccessary to parse simple S-expressions consisting only of lists, numbers and symbols. A hand-written recursive descent parser is probably simpler and at least as fast. The general pattern would look like this (in java, c# should be very similar):
Object readDatum(PushbackReader in) {
int ch = in.read();
return readDatum(in, ch);
}
Object readDatum(PushbackReader in, int ch) {
if (ch == '(')) {
return readList(in, ch);
} else if (isNumber(ch)) {
return readNumber(in, ch);
} else if (isSymbolStart(ch)) {
return readSymbol(in, ch);
} else {
error(ch);
}
}
List readList(PushbackReader in, int lookAhead) {
if (ch != '(') {
error(ch);
}
List result = new List();
while (true) {
int ch = in.read();
if (ch == ')') {
break;
} else if (isWhiteSpace(ch)) {
skipWhiteSpace(in);
} else {
result.append(readDatum(in, ch);
}
}
return result;
}
String readSymbol(PushbackReader in, int ch) {
StringBuilder result = new StringBuilder();
result.append((char)ch);
while (true) {
int ch2 = in.read();
if (isSymbol(ch2)) {
result.append((char)ch2);
} else if (isWhiteSpace(ch2) || ch2 == ')') {
in.unread(ch2);
break;
} else if (ch2 == -1) {
break;
} else {
error(ch2);
}
}
return result.toString();
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With