I have a bunch of JSON files, thousands of different schemas. Using GenSON
(the Python JSON schema generator), I managed to create schema files for each of the input files. Now, what I'd like to do is standardize all these different files to one defined schema. Here's an example:
Input
{
"name": "Bob Odenkirk",
"title": "Software Engineer",
"location": {
"locality": "San Francisco",
"region": "CA",
"country": "United States"
},
"age": 62,
"status": "Active"
}
Output
{
"names": ["Bob Odenkirk"],
"occupations": ["Software Engineer"],
"locations": ["San Francisco, CA"]
}
Essentially, I am looking for a language agnostic method (i.e., I don't care what programming language is used) of defining how an input JSON file should be parsed to an output JSON file.
You can use the Jackson API to add field or transform a JSON without creating POJO's. It provides a object form called JsonNode , JsonObject and JsonArray types which can be transformed like i did in the below code. I hope this helps you.
A schema can reference another schema using the $ref keyword. The value of $ref is a URI-reference that is resolved against the schema's Base URI. When evaluating a $ref , an implementation uses the resolved identifier to retrieve the referenced schema and applies that schema to the instance.
JSON Reference allows a JSON value to reference another value in a JSON document. This module implements utilities for exploring these objects.
If you do not have a JSON schema, you can model a JSON message that contains JSON objects, JSON arrays, or both, by following the steps in this topic to create an equivalent XML schema model, which you can then use in one or more message maps with the Cast function in the Graphical Data Mapping editor.
When you cast the JSON.Data.any on the input side of the map to define the JSON message, you can only use schema models that are defined in the default namespace. For example, if you build a JSON input with a cast to an element in a none empty namespace, the Mapping node runs without throwing an error.
The current version is 2019-09! JSON Schema is a vocabulary that allows you to annotate and validate JSON documents.
By using syntax from_json (Column jsonStringcolumn, DataType schema), you can convert Spark DataFrame with JSON string into MapType (map) column. MapType is a subclass of DataType. import org.apache.spark.sql.functions.{ from_json, col } import org.apache.spark.sql.types.{
The url https://github.com/bazaarvoice/jolt#jolt says that Jolt may be what you're looking for.
Jolt
JSON to JSON transformation library written in Java where the "specification" for the transform is itself a JSON document.
Useful For
Transforming JSON data from ElasticSearch, MongoDb, Cassandra, etc before sending it off to the world
Extracting data from a large JSON documents for your own consumption
Jolt Spec
[
// First build the "city, state" string for location
{
"operation": "modify-default-beta",
"spec": {
"location": {
"locConcat": "=concat(@(1,locality),', ',@(1,region))"
}
}
},
// Then map the fields as needed to positions in an output json
{
"operation": "shift",
"spec": {
"name": "name[0]",
"title": "occupations[0]",
"location": {
"locConcat": "locations[0]"
}
}
}
]
I am not sure is your expecting like below. Long time back I have created flat object and output format object. It will return output format object with data filled.
var input = {
"name": "Bob Odenkirk",
"title": "Software Engineer",
"location": {
"locality": "San Francisco",
"region": "CA",
"country": "United States"
},
"age": 62,
"status": "Active"
};
var outputFormat = {
"name": "name",
"occupations": "title",
"locations": "location.locality, location.region"
};
var flatInput = {};
function generateFlatInput(input, parent){
for (var prop in input) {
if(input.hasOwnProperty(prop) && typeof input[prop] === 'object')
flatInput = generateFlatInput(input[prop], parent + prop + '.');
else
flatInput[parent + prop] = input[prop];
}
return flatInput;
}
function generateOutput(input, outputFormat, delimiter){
input = generateFlatInput(input, '');
for (var prop in outputFormat) {
var fields = outputFormat[prop].split(delimiter);
var fieldValue = [];
for(i = 0; i < fields.length; i++){
if(!input.hasOwnProperty(fields[i].trim())) continue;
fieldValue.push(input[fields[i].trim()]);
}
outputFormat[prop] = fieldValue.join(delimiter);
}
return outputFormat;
}
console.log(generateOutput(input, outputFormat, ', '));
https://jsfiddle.net/u2yyuguk/1/
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With