Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Datatypes for representing JSON in C++

I've been trying to figure this one out for a while now, and maybe I've just stared too long at it?

Anyhow the problem at hand is to find a good way to represent JSON in C++ and before you read any longer, please note that I am not interested in libraries capable of it, so I want to do it in raw C or C++ (C++11 is fine), no boost, no libjson I know about them and for reasons outside of the scope of this question I can't (/wont) add dependencies.

Now that that's cleared up let me tell you a bit about the problem, and what I have tried this far.

The problem is to find a nice way to represent JSON in C++, the reason this is a bit problematic is that JSON is super-loosely typed, while C++ is really hard typed. consider JSON for a second, what is JSON really capable of typewise?

  • Number (e.g. 42 or 3.1415)
  • String (e.g. "my string")
  • Array (e.g. [], or [1,3.1415,"my string])
  • Object (e.g. {} or {42, 3.1415, "my string", [], [1,3.1415, "my string]}

So what this mean is that there are two "raw" types, Number and String, and two container types Array and Object. The raw types are fairly straight forward, while the container types become tricky in C/C++ as they can and probably will contain elements of different types, so any built in type in the language will not suffice as is, an array can't hold elements of different types. This holds true for STL-types as well (list, vector, array and so on), (unless they have polymorphic equality).

So any container in JSON can hold any type of json-type which is pretty much all there is to it.

What I've prototyped, or tried and why it wont work My first naive thought were to just use templates, so I set up a json-object or json-node type which would then use templates to decide what's in it, so it would then have a structure something like this:

template <class T>
class JSONNode {
    const char *key;
    T value;
}

While this seemed promising, however when starting to work with it I realized that I ran into troubles when I tried to order the nodes into a container-type (such as array, vector, unordered_map and so on), because they still want to know the type of that JSONNode! if one node is defined as JSONNode<int> while another is JSONNode<float> well then it will be problematic to have them in a container.

So I move past that, I am not all that interested in keeping them in a container anyway, I'd be rather happy to make them self-aware or what to call it, i.e. ad in a pointer to the next node, but again it gets tricky to figure out the type of the node, and just about here is when I start thinking polymorphism.

Polymorphism Let's just make a virtual JSONNode and implement an JSONNumberNode, JSONStringNode, JSONArrayNode and JSONObjectNode type and they will fit nicely into any container I might want them in, using polymorphism to let them all be JSONNodes.

An example of the code might be in place.

class JSONNode {
public:
    const char *key;
    //?? typed value, can't set a type
};

class JSONNumberNode : public JSONNode { 
public:
    int value;
}

class JSONStringNode : public JSONNode {
public:
    const char *value;
}

At first I thought this was the way to go. However when I started thinking about how to handle the value-part I realized that I couldn't access the value, even if I wrote a specific function to retrieve the value, what would it return?

So sure I do have objects with different typed values, but I can't really access them without first casting to the proper type, so I could do a dynamic_cast<JSONStringNode>(some_node);, but how would I know what to cast it to? RTTI? Well I feel it's getting just a tad bit to complicated at that point, I guess I might be able to use a typeof or decltype figuring out what to typecast it to, but haven't been successful..

POD types So I tried something different, I thought to argue that perhaps I could actually do this in a pod-way. Then I would set up the value part to be void * and try have some union keeping track of the types. However I get the same problem as I already have, namely how to cast data to types.

I feel the need to wrap this question why I didn't go deeper into what I've tried using POD..

So if anyone have a smart solution to how to represent JSON in C++ given this information I would be ever so thankful.

like image 392
qrikko Avatar asked Oct 23 '13 13:10

qrikko


2 Answers

I think that you're going on the correct direction with your last approach, but i think that it need to change some concept dessigns.

In all the JSON parsers i've worked so far the decision of choosing the type of the container was on the user-side not on the parser-side, and i think that is a wise decision, why? let's suppose you have a node that contains a number in string format:

{
    "mambo_number": "5"
}

You don't know if the user would want to retrieve the value as a string or as a number. So, i'll point that the JSONNumberNode and the JSONStringNode wouldn't fit the best approach. My advice is to create nodes for holding objects, arrays and raw values.

All of this nodes will contain a label (name) and a list of nested objects according of its main type:

  • JSONNode: The base node class wich contains the key and the type of the node.
  • JSONValueNode: The node type that manage and contains raw values, like the Mambo nº5 listed above, it would provide some functions to read its value, like value_as_string(), value_as_int(), value_as_long(), and so far...
  • JSONArrayNode: The node type that manage JSON arrays and contains JSONNodes accessibles by index.
  • JSONObjectNode: The node type that manage JSON objects and contains JSONNodes accesible by name.

I dont know if the idea is well documented, lets see some examples:

Example 1

{
    "name": "murray",
    "birthYear": 1980
}

The JSON above would be a unnamed root JSONObjectNode that contains two JSONValueNodes with the labels name and birthYear.

Example 2

{
    "name": "murray",
    "birthYear": 1980,
    "fibonacci": [1, 1, 2, 3, 5, 8, 13, 21]
}

The JSON above would be a unnamed root JSONObjectNode that contains two JSONValueNodes and one JSONArrayNode. The JSONArrayNode would contain 8 unnamed JSONObjectNodes with the 8 first values of the Fibonacci sequence.

Example 3

{
    "person": { "name": "Fibonacci", "sex": "male" },
    "fibonacci": [1, 1, 2, 3, 5, 8, 13, 21]
}

The JSON above would be a unnamed root JSONObjectNode that contains a JSONObjectNode with two JSONValueNodes with the labels name and sex and one JSONArrayNode.

Example 4

{
    "random_stuff": [ { "name": "Fibonacci", "sex": "male" }, "random", 9],
    "fibonacci": [1, 1, 2, 3, 5, 8, 13, 21]
}

The JSON above would be a unnamed root JSONObjectNode that contains two JSONArrayNode, the first one, labeled as random_stuff would contain 3 unnamed JSONValueNode wich will be of type JSONObjectNode, JSONValueNode and JSONValueNode in order of appearance, the second JSONArrayNode is the fibonacci sequence commented before.

Implementation

The way i would face the implementation of the nodes would be the following:

The base node would be aware of it's own type (Value Node, Array Node or Object Node) via the member type, the value of type is provided on construction time by the derived classes.

enum class node_type : char {
    value,
    array,
    object
}

class JSONNode {
public:
    JSONNode(const std::string &k, node_type t) : node_type(t) {}
    node_type GetType() { ... }
    // ... more functions, like GetKey()
private:
    std::string key;
    const node_type type;
};

The derived classes must provide to the base one the type of node in construction time, the Value Node provides to the user the conversion of the stored value to the type the user request:

class JSONValueNode : JSONNode {
public:
    JSONValueNode(const std::string &k, const std::string &v) :
        JSONNode(k, node_type::value) {} // <--- notice the node_type::value
    std::string as_string() { ... }
    int as_int() { ... }
    // ... more functions
private:
    std::string value;
}

The Array Node must provide the operator[] in order to use it as an array; implement some iterators would be worthwhile. The stored values of the internal std::vector (choose the container that you consider the best for this purpose) would be JSONNode's.

class JSONArrayNode : JSONNode {
public:
    JSONArrayNode(const std::string &k, const std::string &v) :
        JSONNode(k, node_type::array) {} // <--- notice the node_type::array
    const JSONObjectNode &operator[](int index) { ... }
    // ... more functions
private:
    std::vector<JSONNode> values;
}

I think that the Object Node must provide the operator[] with string input, because in C++ we cannot replicate the JSON node.field accessor, implement some iterators would be worthwhile.

class JSONObjectNode : JSONNode {
public:
    JSONObjectNode(const std::string &k, const std::string &v) :
        JSONNode(k, node_type::object) {} // <--- notice the node_type::object
    const JSONObjectNode &operator[](const std::string &key) { ... }
    // ... more functions
private:
    std::vector<JSONNode> values;
}

Usage

Assuming that all the nodes have all the required functions, the idea of usage of my aproach will be:

JSONNode root = parse_json(file);

for (auto &node : root)
{
    std::cout << "Processing node type " << node.GetType()
              << " named " << node.GetKey() << '\n';

    switch (node.GetType())
    {
        case node_type::value:
            // knowing the derived type we can call static_cast
            // instead of dynamic_cast...
            JSONValueNode &v = static_cast<JSONValueNode>(node);

            // read values, do stuff with values
            break;

        case node_type::array:
            JSONArrayNode &a = static_cast<JSONArrayNode>(node);

            // iterate through all the nodes on the array
            // check what type are each one and read its values
            // or iterate them (if they're arrays or objects)
            auto t = a[100].GetType();
            break;

        case node_type::object:
            JSONArrayNode &o = static_cast<JSONObjectNode>(node);

            // iterate through all the nodes on the object
            // or get them by it's name check what type are
            // each one and read its values or iterate them.
            auto t = o["foo"].GetType();
            break;
    }
}

Notes

I wouldn't use the Json-Whatever-Node naming convention, i preffer to place all the stuff into a namespace and use shorter names; outside of the scope of the namespace the name is pretty readable and undesrtandable:

namespace MyJSON {
class Node;
class Value : Node;
class Array : Node;
class Object : Node;

Object o; // Quite easy, short and straightforward.

}

MyJSON::Node n;  // Quite readable, isn't it?
MyJSON::Value v;

I think is worthwhile to create null versions of each object to provide in case of invalid acces:

// instances of null objects
static const MyJSON::Value null_value( ... );
static const MyJSON::Array null_array( ... );
static const MyJSON::Object null_object( ... );

if (rootNode["nonexistent object"] == null_object)
{
    // do something
}

The premise is: return null object type the case of accesing a nonexistent sub-object in an object node or a out-of-bound acces to an array node.

Hope it helps.

like image 148
PaperBirdMaster Avatar answered Oct 10 '22 03:10

PaperBirdMaster


Your last two solutions would both work. Your problem in both of them seems to be extracting the actual values, so let's look at examples. I'll cover the POD idea, for the simple reason that using polymorphy would indeed require RTTI, which IMHO is ugly.

JSON:

{
    "foo":5
}

You load this JSON file, what you'll get is just your POD "wrapper".

json_wrapper wrapper = load_file("example.json");

Now you assume the JSON node you loaded is a JSON object. You now have to handle two situations: Either it is an object, or it is not. If it is not, you'll likely end up in an error state, so exceptions could be used. But how would you extract the object itself? Well, simply with a function call.

try {
    JsonObject root = wrapper.as_object();
} catch(JSONReadException e) {
    std::cerr << "Something went wrong!" << std::endl;
}

Now if the JSON node wrapped by wrapper is indeed a JSON object, you can continue in the try { block with whatever you want to do with the object. Meanwhile, if the JSON is "malformed", you go into the catch() { block.

Internally, you would implement this something like this:

class JsonWrapper {
    enum NodeType {
       Object,
       Number,
       ...
    };

    NodeType type;

    union {
        JsonObject object;
        double number
    };

    JsonObject as_object() {
        if(type != Object) {
            throw new JSONReadException;
        } else {
            return this->object;
        }
    }
like image 45
cib Avatar answered Oct 10 '22 03:10

cib