Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Processing a data feed format

Tags:

c#

I had an interesting interview question the other day, which I really struggled with. The (highly ambitious) spec required me to write, in C#, parsers for two different data streams. Here is a made-up example of the first stream:

30=EUR/USD,35=3,50=ON,51=12.5,52=13.5,50=6M,51=15.4,52=16.2,50=1Y,51=17.2,52=18.3

where 30 is the currency pair, 35 is the number of tenors, and 50,51,52 are the tenor,bid and ask respectively. The bid and ask are optional, but a correct tenor-bid-ask tuple will have at least one of the two prices. The framework code they supplied implied that the result of parsing this line should be 3 individual objects (DataElement instances). I ended up with a rather nasty switch-statement and loop-based implementation that I am not sure actually worked.

What techniques are there for reading this kind of stream? I tried to figure out something with recursion, which I couldn't get right.

EDIT: Based on @evanmcdonnall's answer (accepted) here is the fully compiling and working code, in case it's useful for anyone else.

        List<DataElement> Parse(string row)
    {
        string currency=string.Empty;
        DataElement[] elements = null;
        int j = 0;
        bool start = false;
        string[] tokens = row.Split(',');
        for (int i = 0; i < tokens.Length; i++)
        {
            string[] kv = tokens[i].Split('=');

            switch (kv[0])
            {
                case "30":
                    currency = kv[1];
                    break;
                case "35":
                    elements = new DataElement[int.Parse(kv[1])];
                    break;
                case "50":
                    if (start)
                        j++;
                    elements[j] = new DataElement() { currency = currency, tenor = kv[1] };
                    start = true;
                    break;
                case "51":
                    elements[j].bid = double.Parse(kv[1]);
                    break;
                case "52":
                    elements[j].ask = double.Parse(kv[1]);
                    break;
            }
        }
        return elements.ToList();
    }

The main concepts are:

  • Have a separate counter for the "inner loop" of repeating items in each line
  • Have a boolean flag to indicate when that "inner loop" begins
  • Allocate the array of objects to store the "inner loop" results at the point where the length is known (i.e., tag 50)
  • For simplicity and clarity, have a function that reads just a single line, then call it multiple times from a separate function.
like image 235
endian Avatar asked Nov 03 '22 23:11

endian


1 Answers

I don't see what's so tricky about it. However, I don't see any solution that would be better than the very specific, iteration with many conditionals solution I have in mind.

First you split on commas, then you loop over those tokens, splitting each on the equal sign to get you key value pair. You have checks for each key and a bool to track when you start/finish an item. You read the currency and use that for each object. You read key 35 and find there are 3 objects, so you allocate an array of three objects, each with 3 properties; tenor, bid, and ask. When you encounter 50 you should set a your start true. You set 50, 51, and 52 if they're there. Below is some sample code;

  string currency;
  int j = 0;
  bool start = false;
  string[] tokens = line.Split(',');
  for (int i =0; i < tokens.length; i++)
  {
        string[] kv = tokens[i].Split('=')
        if (kv[0] == 30)
             currency = kv[1]
        elseif (kv[0] == 35)
        {
             DateElement[] elements = new DataElement[kv[1]];
        }
        elseif (kv[0] == 50)
        {
             if (start)
                 j++;
             start = true; // flip your flag after the condition so it works for element 0
             elements[j].currency = currency;
             elements[j].tenor = kv[1];
        }
        elseif (kv[0] == 51)
             elements[j].bid = kv[1];
        elseif (kv[0] == 52)     
            elements[j].ask = kv[1];
       // if these optional values aren't there we'll just fall back into the case for 50
      // and everything will work as expected.
  }

The code may not be pretty, but the logic is fairly trivial and, assuming the lines format is correct, it will always work.

like image 180
evanmcdonnal Avatar answered Nov 08 '22 13:11

evanmcdonnal