Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Querying JSON-LD at scale

The question of large scale data architecture is of course a vast topic and I am far from an expert. However, I am interested in how JSON-LD is used at scale, so please excuse the lack of specificity and the high-level question.

Clearly, big players like Google incorporate JSON-LD for example in Google Knowledge Graph.

Taking this as an example, supposing that JSON-LD is used as data format for I/O in the Knowledge Graph, how is the data base build so it is possible to query such masses of data? Is it reliant on translating to RDF-triples for querying with SPARQL, or are there other architectures that makes data queryable in raw JSON-LD format? What are the tricks, if any, enabling the processing (and querying) of JSON-LD at large scale?

Systems like MongoDB or Virtuoso(?) are useful for managing large JSON-formatted data and making it queryable, but is it ever desirable to specify JSON(-LD) as a back-end format for data rather than, say, xml (if one wishes to use some sort of RDF)?

Again, apologies for the vagueness. Any inputs, such as general pointers or discussion on the topic will be much appreciated.

like image 880
Boris Avatar asked Sep 19 '17 17:09

Boris


1 Answers

So the tl;dr is that JSON-LD is queried at scale by inserting it into something that queries data at scale.

JSON-LD is syntax for data to facilitate exchange. Asking how to query it specifically, doesn't really make any sense.

Querying it at scale is just a matter of putting it into a database. Since there's the obvious mapping to the RDF data model, any RDF database would work. JSON-LD probably also would get ingested easily into any document database, like MarkLogic, where it could then be queried. And if you had a regular schema the JSON documents conformed to, it would not be hard to insert them and query using SQL. In fact, Postgres supports JSON to some degree natively, so that probably would just work straight away.

Any of those options will get you query "at scale". Some systems are going to be better than others depending on your definition of at scale and what kind of workload you're going to throw at the system. There's also the design choice of SPARQL or SQL, or neither, in how you query the data. I'm a personal fan of SPARQL over SQL, but I have a somewhat biased opinion on that.

imo JSON-LD, or just JSON, is a good exchange syntax between a backend system and the front-end where JSON is easily parsed and used in any Javascript environment. JSON/JSON-LD is fairly human readable, so it can also be a presentation syntax for us mere mortals. But for exchange between systems, a binary serialization of the data makes significantly more sense.

like image 167
Michael Avatar answered Dec 26 '22 13:12

Michael